UK Web Focus

Innovation and best practices for the Web

When W3C Web Pages Break

Posted by Brian Kelly (UK Web Focus) on 26 June 2008

I was looking at a page on the W3C Web site recently to update my knowledge of the SVG specification and SVG tools.  I noticed a link at the bottom of the Scalable Vector Graphics (SVG) page to an RSS feed for the page, and, as a fan of RSS syndication, thought it might be worth adding this feed to my RSS viewer. However when I clicked on the link, rather than seeing the RSS feed and having the option to add this to my preferred RSS reader, an error message was displayed:

W3C RSS Feed which isn't being displayed

Now validating this RSS feed with the RSS validator on the W3C Web site informs me of an error with the feed:

Sorry

This feed does not validate.

    • line 227, column 87: Undefined named entity: reg (5 occurrences)
      ... ability as well as the Internet Explorer® Plugin and the Windows® ...

This feed does not validate.

It seems that either W3C’s workflow process has failed to removed the registered trademark character for the term “Internet Explorer®” or the RSS schema has failed to included a declaration for this character entity.

No big deal, you may think – and, as the page isdisplayed in the FireFox browser, this is surely another failure of Internet Explorer to follow Web standards.

But if you view the page in Opera you get an XML parser error message:

W3C RSS feed error displayed in Opera browser

And here, I think, both Internet Explorer and Opera seem to be obeying the requirement that user agents aren’t expected to render non-compliant pages.

And this hard line approach has been promoted as a vision of the future of the Web by the W3C.  It has been argued that mandating rigourous compliance with specs would help to maximise interoperabilty.

This may be true – but at  what cost.  As someone who studied engineering at University I am aware of the benefits of a fail-safe approach to design, so that if one small component fails it doesn’t mean that the building will collapse. But in this case one small component (the trademark character entity) which hasn’t been properly defined, has led to a total failure for the page to be rendered in two browsers.

Don’t we need Web resources to be designed so they’ll fail gracefully and will be tolerant if humans make mistakes or, as it seems is the case here, there are failures in the workflow?

About these ads

5 Responses to “When W3C Web Pages Break”

  1. http://www.b-list.org/weblog/2008/jun/18/html/ and all the links in it.

  2. Brian,

    The RSS feed is not W3C Web site. Look at the address: svg.org.
    You are targetting the wrong people in this case. We have broken things sometimes, but this one is unlikely to be fixed by us.

  3. Hi Karl
    Thanks for the replky – and pointing out my error.
    This adds, though, to the implications of what happens when pags break in such non-elegant ways. As I followed a link from the W3C page to an RSS feed of the conent of the W3C’s page I had expected this to be hosted by the W3C – and the lack of any branding on the error page gave me no clues that I’d left the W3C Web site.
    And, of course, the W3C has a relationship with an external company which is providing a service, such as the provision of an RSS feed (the relationship may be informal, and need not be subject to signed contractual agreements).
    In this scenario, surely it becomes even more important to have data standards which fail gracefully – as you point out, you are in no position to fix content managed by others but which you are dependent on.

  4. Chris Lilley, W3C said

    Hi Brian,

    The news stories (both as content and as syndication) are supplied by a community site, svg.org. That site, in turn, gets its news from any community member who wants to post a story. So , like many sites nowadays, its an amalgam of html snippets from multiple authors, who submitted their content via a web form.

    ‘Classic’ HTML allowed in theory a limited set of predefined entities, which grew over time from HTML 2. In practice, it allowed them whether they were declared in the external DTD subset or not. That approach gave less than optimal results for MathML, with its larger set of entities, or for the ISO entity set, or whatever. So the HTML implementation precedent set a bad example there. (recall that these entities grew from a time before Unicode was even popular, let alone universally deployed. )

    XML has solely 4 predefined entities, and those only because the characters are syntactically significant and need to be escapable when used for non-syntactic purposes. This was entirely the right decision (grandfathering in the largest HTML set, and the MathML sets, and the ISO sets and whatever other sets are common in various fields, would have been another, unwieldy, solution). So Opera and IE are completely correct to raise a parsing error, and Firefox incorrect to silently guess what may have been meant.

    (This sort of contamination from HTML into XML is the reason that many RSS feeds escape the whole HTML portion, so its seen as an opaque string rather than markup).

    For this type of user contributed content, the form processor could usefully convert such entities to the corresponding Unicode character (and also check well formedness, and perhaps even add or prompt for alt attributes). Of course this applies not only to the W3C site but in fact to any forum, blog, or similar community site.

  5. Chris Lilley, W3C said

    I should have said that for this specific example, the path to fixing it is for an svg.org admin to edit the news story. I just did that, but the story does in fact use the ® character and not an entity. So it is being (unwisely) converted to an entity by whatever processes the form. Odd….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: