Quick Primer on XHTML Markup

February 1, 2007

Writing valid XHTML markup is not really as difficult as it sounds. If you learn a few new rules, you'll be marking up code in valid XHTML in no time. The best way to familiarize yourself with valid markup is to view source on site pages that have passed one of the validation services, such as W3C's Markup Validation Service. Here is a quick rundown of the basics to help you get started.

Differences Between HTML and XHTML Markup

Semantic markup

XHTML markup is semantic. This means lists, tables, paragraphs and headers are coded according to what they are. For instance, headers are all marked as <h1>, <h2>, <h3>, and so on. <table> is used only on content that is truly tabular material. <p> is used only to denote true paragraphs. Mark up elements according to what they are, not how you want them to look—that will be taken care of in the CSS file.

DocType

Browsers require this information and may not render pages correctly without it. Using the correct DOCTYPE tells the browser to render your pages in standards–compliant mode. Without one or with an incorrect DOCTYPE, browsers assume your code is non-compliant ("old-fashioned") and will render your pages in quirks mode. For those beginning XHTML markup, "transitional" may be the most forgiving. A full list of all DOCTYPES can be found on W3C's site.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Style tags

On standards-compliant pages, presentation is separated from content. Style tags such as those for fonts, colors, sizes, borders and spacing are not included on the page. These are defined in the cascading style sheet (CSS).

Closing tags

In HTML it was okay to omit closing tags. XHTML requires that all tags be closed, even empty ones. For instance, when you use a <p> tag at the beginning of a paragraph, you must use a </p> at the end.

Some of the most familiar tags that have a closing pair are:

<p> ... </p>
<i> ... </i>
<b> ... </b>
<strong> ... </strong>
<em> ... </em>
<a href=""> ... </a>
<h1> ... </h1>
<ul> ... </ul>
<li> ... </li>
<blockquote> ... </blockquote>

Tags which don’t have a closing pair must still be closed, like this:

<img src="http://www.sitename.com/images/box.gif" alt="Gift Box" align="left" />
and </ br>

Lower case

XHTML requires that all tags and attributes be in lower case.

Wrong: <IMG SRC="http://www.sitename.com/images/box.gif" ALT="Gift Box" ALIGN="left" />
Right: <img src="http://www.sitename.com/images/box.gif" alt="Gift Box" align="left" />

Quotes

XHTML will generate errors if you leave quotation marks off values.

Wrong: <img src=http://www.sitename.com/images/box.gif alt=Gift Box align=left />
Right: <img src="http://www.sitename.com/images/box.gif" alt="Gift Box" align="left" />

Values

XHTML will not allow you to omit values.

Wrong: <img src="http://www.sitename.com/images/box.gif" alt="Gift Box" noshade />
Right: <img src="http://www.sitename.com/images/box.gif" alt="Gift Box" noshade="noshade" />

Headers

On XHTML and standards compliant pages, headers are especially important. Best practice is to reserve H1 headers for page titles. The next level head must be an H2 (don’t skip to H3). When marking up headers, envision or even sketch out the headers as an outline. Mark headers according to their place in the outline, not according to the style they will have—that can be taken care of in the CSS file.

Paragraphs

All paragraphs are wrapped in <p> </p> tags. Blockquotes can be used only if the content is truly a quote. Do not used <p> tags to create space. Use a </ br>, or better yet, add space by denoting it in the CSS file.

Strong vs. Bold

The <strong> and <b> tags are not the same. Strong denotes emphasis—it adds meaning. Bold merely makes the text look heavier—it adds no meaning to the text, merely style. The difference is easiest to envision with voice readers (audio devices used by the sight-impaired to “view" web pages). A voice reader ignores all style attributes, which add nothing to the meaning of the text. So they will pass over a <b> tag. But text surrounded by <strong></strong> will be voiced with greater emphasis, indicating that text has greater importance.

Emphasis vs. Italics

As with <strong> and <b>, <em> and <i> are not the same. <i> is merely styling—you can use it to make text look better, for instance. On the other hand <em> confers meaning; it indicates that the text within has more importance and should be emphasized.

Lists

There are three types of lists, and all content that qualifies as a list (for instance, menu items) should be styled that way. Unordered lists (<ul>), ordered lists (<ol>) and definition lists (<dl>). Unordered lists are bulleted, and items within them are denoted with a <li>...</li>. Ordered lists are numbered; items within them also are denoted with a <li>...</li>. Definition lists are made up of definition terms <dt>...</dt> and definition descriptions <dd>...</dd>.

<ul>
<li>This is an unordered list.</li>
<li>It is bulleted.</li>
</ul>

<ol>
<li> This is an ordered list.</li>
<li> It has numbers.</li>
</ol>

<dl>
<dt>This is a term.</dt>
<dd>It has a definition.</dd>
<dt>Here is another term.</dt>
<dd>It also has a definition.</dd>
</dl>
  • This is an unordered list.
  • It is bulleted.
  1. This is an ordered list.
  2. It has numbers.
This is a term.
It has a definition.
Here is another term.
It also has a definition.


Comments

Hi Kathy,

Thanks for the primer. I know it's intended audience is those moving from HTML and delving into XHTML for the first time, but I just wanted to point out a couple of things that may raise eyebrows depending on who sees this.

  1. formatting attributes are only required to be separated when using the strict doctype. If you're using a transitional doctype, you still have bgcolor, text, and link attributes (they just tell you you shouldn't use them...)
  2. XHTML requires that all tags and attribute names be lowercase. You do this in your examples, but I thought it was worth pointing out
  3. The XHTML spec doesn't actually say that H1 can only be used for page titles, but it's considered by many to be a best practice. Unless you're using ISO HTML instead of W3C.
  4. A couple of times you mention the <bold> element which really should be the <b> element

One thing that I try to remind people of is that HTML was always supposed to be semantic, it just was almost impossible to create any reasonable websites that way... The push towards XHTML (and now towards XHTML2 and/or HTML5 and CSS3) is about recognizing and enforcing a bit more of that both from the browser rendering, but also from educating the developers. So thank you for making this quick primer available and getting the word out.

-Erik Peterson

Erik--Thanks for pointing these out. As for <bold>... Yeah, I've been writing a lot of strange code lately. I think I need to stop playing with MediaWiki. :) Thanks again for your help!

And I actually just read a good article on Accessibility that points to the heading levels issue.

Before reading the article, I would have said that I was pretty good with the WCAG1.0, but I did a little research and found that they've relaxed the language quite a bit in WCAG 2.0 around skipping headings. And in particular:

"Skipping levels in the sequence of headings may create the impression that the structure of the document has not been properly thought through or that specific headings have been chosen for their visual rendering rather than their meaning."

(would have used a blockquote tag, but it's not one of the allowed tags in the comments...)

Just a little more fuel for the fire ;-)

-Erik

Great primer, very useful.

Just a little nitpick, you wrote: "Use a < br>, or better yet..." but wouldn't it have to be <br />?
I can never remember...

This post has been slightly modified and a few corrections made. Thanks to some eagle-eyed friends, it hopefully will be more useful to others.