Experiences Migrating From XHTML 1 to HTML5
Posted by Brian Kelly on 10 November 2010
IWMW 2010 Web Site as a Testbed
In the past we have tried to make use of the IWMW Web site as a test bed for various emerging new HTML technologies. On the IWMW 2010 Web site this year we evaluated the OpenLike service which “provides a user interface to easily give your users a simple way to choose which services they provide their like/dislike data” as well as evaluating use of RDFa.
We also have an interest in approaches to migration from use of one set of HTML technologies to another. The IWMW 2010 Web site has therefore provided an opportunity to evaluate deployment of HTML5 and to identify possible problem areas with backwards compatibility.
Migration of Main Set of Pages
We migrated top-level pages of the Web site from the XHTML1 Strict Doctype to HTML5 and validation of the home page, programme, list of speakers, plenary talks and workshop sessions shows that it was possible to maintain the HTML validity of these pages.
A small number of changes had to be made to in order to ensure that pages which were valid using an XHTML Doctype were valid using HTML5. In particular we had to change the form> element for the site search and replace all occurrences of <acronym> to <abbr>. We also changed occurrences of <a name="foo"> to <a id="foo"> since the name attribute is now obsolete.
The W3C’s HTML validator also spotted some problems with links which hadn’t been spotted previously when we ran a link-checking tool. In particular we spotted a couple of occurrences of the form <a href="http://www.foo.bar "> with a space being included rather than a trailing slash. This produced the error message:
Line 175, Column 51: Bad value http://www.foo.bar for attribute href on element a: DOUBLE_WHITESPACE in PATH.
Syntax of IRI reference:
Any URL. For example: /hello, #canvas, or http://example.org/. Characters should be represented in NFC and spaces should be escaped as %20.
This seems to be an example of an instance in which HTML5 is more restrictive than XHTML 1 or HTML 4.
Although many pages could be easily converted to HTML5 a number of pages there were HTML validity problems which had been encountered with the XHTML 1 Transitional Doctype which persisted using HTML5. These were pages which included embedded HTML fragments provided by third party Web services such as Vimeo and Slideshare. The Key Resources page illustrates the problem, for which the following error is given:
An object element must have a data attribute or a type attribute.
related to the embedding of a Slideshare widget.
Pages With Embedded RDFa
The Web pages for each of the individual plenary talks and workshop sessions contained embedded RDFa metadata about the speakers/workshop facilitators and abstracts of the sessions themselves. As described in a post on Experiments With RDFa and shown in output from Google’s Rich Snippets Testing tool RDFa can be used to provide structured information such as, in this case, people, organisational and event information for an IWMW 2010 plenary talk.
However since many of the pages about plenary talks and workshop sessions contain embedded third party widgets including, for the plenary talks, widgets for videos of the talks and for the accompanying slides, these pages mostly fail to validate since the widget code provided by the services often fails to validate.
A page on “Parallel Session A5: Usability and User Experience on a Shoestring” does, however validate using the XHTML1+RDFa Doctype, since this page does not include any embedded objects from such third party services. However attempting to validate this page using the HTML5 Doctype produces 38 error messages.
The experiences in looking to migrate a Web site from use of XHTML 1 to HTML5 shows that in many cases such a move can be achieved relatively easily. However pages which contain RDFa metadata may cause validation problems which might require changes in the underlying data storage.
The W3C released a working draft of a document on “HTML+RDFa 1.1: Support for RDFa in HTML4 and HTML5” in June 2010. However it is not yet clear if the W3C’s HTML validator has been updated to support the proposals containing in the draft document. It is also unclear how embedding RDFA in HTML5 resources relates to the “HTML Microdata” working draft proposal which was also released in June 2010 (with an editor’s draft version dated 20 October 2010 also available on the W3C Web site).
I’d welcome comments from those who are working in this area. In particular, will the user interface benefits provided by HTML5 mean that HTML5 should be regarded as a key deployment environment for new services, or is there a need to wait for consensus to emerge on ways in which metadata can be best embedded in such resources in order to avoid maintenance problems downstream?