I’ve started working on a few presentations to take on the road to developer conferences and one of the topics is a presentation on HTML5.

Many folks look at new technologies from a somewhat academic perspective, like folks that are frequent public speakers at technical events. This is an important part of the technical evolutionary process. Folks that push the edges of new technologies for their own sake lay the groundwork for how will will apply those technologies in the future.

But I tend to like to focus more on the practical business applications of new technology.

We all know that the World Wide Web is broken.

Once upon a time there was the idea of the World Wide Web as one huge semantic network of information.

se·man·tic

adjective 1. of, pertaining to, or arising from the different meanings of words or other symbols

What we have instead is a glut of content.

From my perspective, one of the most significant enhancements coming in HTML is the collection of additions to semantic markup.

A huge problem with HTML prior to version 5 is that markup lacked any real informational context.

Tags like DIV, SPAN, and TABLE are used to organize the presentation of the content on our page and, of course their associated Id tags can be used to execute code against those elements. Also we have a collection of tags that we can use to describe the aesthetics of other tags, like style, class, width, etc.

But, none of that markup helps us identify actually what is contained inside those tags.

The HTML5 <article> tag, as well as other “semantic” tags in HTML5 go a long way to enable us to solve this problem.

Here is what the HTML5 spec says about the <article> tag.

The article element represents a component of a page that consists of a self-contained composition in a document, page, application, or site and that is intended to be independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.

W3C Specification

Take special note of the phrase “self contained composition”.

Lets look at how an article might have been presented in HTML 4.


<div>
   
   <h1>The 2011 Super Open Source Event</h1>
   <div>

      <h2>Keynotes</h2>

      <p>Important people talking about important stuff.</p>

   </div>
   <div>

      <h2>Sessions</h2>

      <p>Breakout sessions for learning technical details.</p>

   </div>
   <div>

      <h2>Roundtables</h2>

      <p>Open discussions about the new technology.</p>

   </div>
   <strong>Link to get all the videos after the event.</strong>

</div>

Though this content is well organized, there is no semantic context to the content.

So ? You may ask why that matters.

It might not matter if you only want to read content on sites that you already know about and visit, but that’s really far less than the way the web was meant to work.

In fact, even in today’s web, this use-case accounts for a VERY small percentage of the way people actually use the web. The popularity of search engines is clear evidence that people begin most of there interaction with the web by starting with a discovery process. The content above provides very little guidance to discovery mechanisms like search engine indexers which primarily have to rely on word indexing and complex inference algorithms to determine the nature of the content on a page and, even at that, the search engine can’t really help you consume the content it finds, it can only send you to that “page” for further determination as to the appropriateness of the page for your needs.

We have created complete additional technologies like RSS and ATOM to help solve this problem. The problem with those technologies is that they are additions to the actual content. We need to create the content AND we need to create the “feed” that defines that content.

While this works, it is a compete secondary effort and it only works for content when the feed data is explicitly created for that given content. The rest of the web is still just a bunch of words.

Let’s say, for example, that I was building a web site for Tennis fans. If I wanted to index and aggregate articles from various Tennis web sites and those web sites used markup like that above, it would be nearly impossible for an application to figure out what constituted a story on that site.

But, using HTML5′s semantic markup, the content itself contains logical context.

Let’s look at the content above in HTML5 markup.


<article>
   
   <h1>The 2011 Super Open Source Event</h1>
   <section>

      <h2>Keynotes</h2>

      <p>Important people talking about important stuff.</p>

   </section>
   <section>

      <h2>Sessions</h2>

      <p>Breakout sessions for learning technical details.</p>

   </section>
   <section>

      <h2>Roundtables</h2>

      <p>Open discussions about the new technology.</p>

   </section>
   <strong>Link to get all the videos after the event.</strong>

</article>

You can also see an additional addition to HTML5′s new semantic markup – the <section> tag which can be used to segment an article into logical sub units.

The section element represents a generic section of a document or application. A section, in this context, is a thematic grouping of content, typically with a heading.

Examples of sections would be chapters, the various tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site’s home page could be split into sections for an introduction, news items, and contact information.

I’m working on some code code samples. In the mean time, share your thoughts.