Should HTML be considered as a data format?

In a short, but thought provoking post, Bertrand Le Roy asks whether HTML has evolved into a purely data carrying format, which is what, after all, it was meant to be in the first place.

Unless I missed the point totally, this is, in fact, the general direction of the XHTML strict specifications. With both appearance (via css) and behaviour (go jQuery!) being decoupled from the data, the contents of the html file represent only content, which is, in my opinion, Web Dev Nirvana. We cannot really have semantic html (as I understand it, this simply means that elements only ever describe what their content is, not what it does or looks like) until we have this separation.

In an aside, Mr. Le Roy says “and if we can ignore the huge majority of existing contents that is less than ideally written”. Thinking about it, it’s not as up-in-the-air-idealistic as it sounds. It’s true that no browser or device intended for browsing the net can afford to ignore 99% of the content out there, but we already have the tools to ignore this: Doctypes. Internet Explorer and Firefox already enforce different rendering rules based on the doctype for a page; if we had an XHTML SERIOUSLYSTRICT doctype, a reader could easily offer the basic or limited functionality to that (or even none at all) and give the really strict documents the real deal.

To cap it off; I don’t think we’re there yet, but we’ll get there eventually, especially once the beefier CSS3 selectors get more widespread support. Once we have those, we can even do away with the class attribute and specify appearance in a purely declarative way, turning our html into pure content.

We’ve already (mostly) moved on from the dark age of blink and marquee tags, but to be honest, we don’t need to remember the bad old days of nested tables and abused markup; we are still in the bad old days of nested tables etc. It is always good to see people like Mr. Le Roy keep the torch going forward.