Validators Don’t Always Work
Posted by Brian Kelly on 7 February 2007
A standard of much interest to us at UKOLN is RSS. We came across RSS in its very early days: I gave a workshop session on Automated News Feeds at the national Institutional Web Management Workshop back in June 2001 and Andy Powell, a former colleague, included RSS is the JISC Information Environment technical architecture.
I recently discovered that UKOLN RSS feed did not validate, according to the Feed validation service hosted at the W3C. The error appeared to be with the <taxo> modul, but a colleague was convinced that the feed was fine and the problem was with the RSS validator. I was sceptical (surely an open source validation service, hosted at W3C, can’t have a bug in such a fundamental area) and raised this issue on the web-support JISCMail list. Sebastian Rahtz pointed out errors in the examples given in the RSS specification, which made me wonder whether the specification itself was flawed. When I found out that our news feed was created by the RSS::XML module, I wondered if the error could possibly be in this module.
I raised this issue on the W3C’s QA list, asking whether the problem was with (a) our RSS feed; (b) the RSS specification; (c) the application used to generated the feed or (d) the RSS validator. I received a prompt response from Olivier Thereaux (first thing the following morning) which confirmed that our feed was fine; that there were errors in the RSS specification (in particular in an example included in the spec) but that the fundamental error was due to a bug in the validator. This was reported to Sam Ruby, the developer of the validator who, a few hours later, implemented a patch and released this on the main Feed Validator site.
I was very impressed with the speed with which this problem was addressed and a solution deployed. Many thanks to Olivier and Sam for this.
I was, though, also very shocked that a validator for such a widely deployed standard (RSS 1.0) had such bugs (I bet a colleague a pint, later raised to a gallon, that the validator was fine – luckily he didn’t take me up on this!). I had assumed that:
- The development process would have spotted this bug (through use of test cases, code walk-throughs, schema validation, etc.)
- The development community would have spotted bugs in an open source applications, through the ‘many eyes make all bugs shallow’ principle.
- The W3C QA processes would have detected this problem prior to the installation of the service on the W3C Web site.
A colleague pointed out that software developers (which I am not) tend not to have so much faith in validators, and many important and widely deployed applications have bugs.
I am not the only person to have concerns over the lack or resources allocated to this important area: Bjoern Hoehrmann left the W3C QA in July 2006, sending a message to the public-qa-dev list giving his reasons for leaving the group.
Where, then, does this leave me? How can I advise others of the importance of validation and of systematic QA processes if such processes don’t seem to be in place with the W3C? Should I stop writing and giving talks on this (I suspect people’s eyes do glaze over when they hear me harping on about this issue).
But on the other hand, if digital library development programmes are being funded on the assumption that the data and formats are ‘clean’ aren’t services going to break, if this isn’t the case?
And perhaps I’m being over-dramatic over this one incident – the problem may have been an obscure one and at least the bug detected a false negative (it reported that a valid RSS file was invalid) rather than a false positive. And, as I said, the bug was fixed very speedily. So maybe I should continue to promote the importance of compliance with standards – but the wider development community should help to validate the validators. And for formats owned (or, as in the case of RSS 1.0, closely affiliated with) W3C, the W3C QA Interest Group has demonstrated that concerns don’t disappear down a black hole.