“How can metrics be developed that fulfill requirements such as validity, reliability, and suitability?”
The Call for Papers was unambiguous about the important of metrics:
The goal of this symposium is to bring researchers and practitioners together to scope the extent and magnitude of existing …. metrics, and to develop a roadmap for future research and development in the field.
although there was an acknowledgement of the challenges in developing appropriate metrics:
Using numerical metrics potentially allows a more continuous scale for [measurements] and, to the extent that the metrics are reliable, could be used for comparisons. However, it is unclear how metrics can be developed that fulfill requirements such as validity, reliability, and suitability.
I’m pleased to say that I’ve had a paper accepted for the online symposium which will take place on 5 December 2011. But what is the subject of the symposium? I have recently published posts about the complexity of metrics for research papers, including issues such as download statistics for papers which are distributed across multiple services and metrics for providing answers to the question of “what makes a good repository?”. Or perhaps the paper concerned metrics associated with use of Social Web services, another area I have addressed in several posts over the past year.
Both areas are very complex, with people questioning the validity of current approaches which are being taken to developing metrics which can be used to make comparisons – clearly areas worthy of research into how metrics can be developed and to have a questioning and critical appraisal of approaches which are being proposed. But this wasn’t the area addressed in the paper and in the symposium.
Online Symposium on Website Accessibility Metrics
As the call for papers points out “conformance to the Web Content Accessibility Guidelines (WCAG) is based on 4 ordinal levels of conformance (none, A, AA, and AAA) but these levels are too far apart to allow granular comparison and progress monitoring; if a websites satisfied many success criteria in addition to all Level A success criteria, the website would only conform to level A of WCAG 2.0 but the additional effort would not be visible.” It seems that rather than having simple four conformance levels, WAI are looking for more sophisticated algorithms which will be able to differentiate cases in which, for example, a Web page contains hundreds of images, none of which contain the alt attributes which are needed to enhance access to assistive technologies and a Web page which also contains hundreds of images, only one of which fails to have a meaningful alt attribute. Currently both pages with fail WCAG conformance, since this requires all images to contain alt attributes.
It seem that the goal is a Klout score for Web accessibility, but with the difference that the underlying algorithms will be made public. But just as with Klout there is, I feel, a need to question the underlying assumptions which underpin the belief that accessibility can be determined by conformance with a set of rules, developed as part of the WAI’s model based on conformance with guidelines for content (WCAG), authoring tools (ATAG) and browsers and other user agents (UAAG). It is worth, therefore, making some comparisons between metrics-based tools such as Klout for measuring and the range of web accessibility measurement tools of which the now defunct Bobby tool was an early example.
|Metrics for Online Reputation (Twitter)||Metrics for Online Web Accessibility||Impact of Scholarly Research|
|Example of Tools||Klout, Peerindex, …||A-Checker, Bobby (defunct) and others listed in the Complete list of accessibility evaluation tools (last updated in 2006 with several broken links)||Publish or Perish, Microsoft Academic Search, Google Scholar Citations, …|
|Purpose||Measurement of online influence||Measurement of accessibility of Web resources||Measurement of productivity and impact of published scientific works|
|Underlying model||Undocumented algorithms based on analysis of Twitter communities, posts, retweets, etc.||Based on conformance with WAI model, based on three sets of guidelines, for content, authoring tools and user agents. Conformance, however, focuses only on WCAG guidelines.||h-index, g-index, ….|
|Legal status||No legal status.||Conformance required in several countries.||No legal status but may be used to determine research funding.|
|Limitations||The system can be easily ‘gamed’. Tools such as Klout provide use of themselves in order to increase scores. The tools fail to take into account differences across different communities (e.g. use same approaches for comparing reputation of celebrities, brands and public sector organisations).||The system can be easily ‘gamed’. The WGAC 1.0 guidelines promoted use of technologies developed within the host consortium, even when such technologies were little used. The tools fail to take into account the different ways in which the Web can be used (e.g. to provide access to information, to support teaching and learning, to provide access to cultural resources, for games, …).||May be skewed by numbers of authors, self-citations, context of citations, …|
Using Metrics In Context
However I do feel that there is value in metrics, whether this is for helping to identify the quality of research publications, online reputation or accessibility of online resources. The difficulty arises when the metric is regarded as the truth, and becomes a goal in itself. So whilst I feel there is validity in publishing details of Klout, PeerIndex and Tweetstat statistics across a selection of institutional Twitter accounts in order to help understand patterns of usage and, I should add, to understand the limitations of such metrics-based tools, I also feel that institutions would be foolhardy to regard such statistics as concrete evidence of value. Rather such statistics can be useful when used in conjunction with other evidence-based parameters.
The danger with Web accessibility metrics is that they have been used as a goal in their own right. In addition, sadly, the previous government has mandated conformance with these metrics across Government Web sites. And back in 2004 WAI gave their views on Why Standards Harmonization is Essential to Web Accessibility, which seems to be leading to WCAG conformance being mandated across EU countries. If a proposal on “Why Online Reputation Standards Harmonisation is Essential” was published, especially by the body responsible for the online reputation standard which was proposed as the only standard which should be used, there would be uproar, with, I would hope, the research community seeking to explore limitations in the proposed standard.
Fortunately the organisers of the WAI symposium do seem to be aware or criticisms of their approaches to Web accessibility as providing the only legitimate approach. The Call for Papers invited contribution which “may include approaches for measuring ‘accessibility in terms of conformance‘ (metrics that reflect violations of conformance of web content with accessibility guidelines such as WCAG or derivatives such as Section 508) and ‘accessibility in use‘ (metrics that reflect the impact that accessibility issues have on real users, regardless of guidelines)” (my emphasis).
The fundamental objection myself and fellow author of our series of paper on this subject, is that accessibility is not an innate characteristic of a digital object, but of the user’s difficulty in engaging with an object to fulfil a desired purpose. The view that all Web resources must be universally accessible to everyone, which underlies pressures for organisations to conform with WCAG guidelines, is a flawed approach.
So if I’m critical of metrics related to conformance with guidelines, what do I feel is needed? Our papers argues for making use of metrics related to guidelines related to the processes surround the development of online resources. In the UK the BS 8878 guidelines provide the relevant Code of Practice. As Jonathon Hassell pointed out in a recent post on For World Usability Day: The state of accessibility on the HassellInclusion blog:
[BS8878's] goals were to share best practice in the first Standard about the process of accessibility rather than it’s technical aspects. It’s succeeded in helping harmonise the separate worlds of inclusive design, personalisation and WCAG approaches to accessibility.
Jonathon went on to add:
Uptake is always difficult to measure, and it’s still early days for organisations to go public and say they have changed the way they work to follow BS8878. However, some organisations already have including: Royal Mail, beta.gov.uk and Southampton University. And many others are working on it. BS8878 is one of the best-selling standards BSI have ever created – so it’s met their goals. I’ve trained many organisations globally and my BS8878 presentations on slideshare have been viewed by over 6000 people from over 25 countries.
There is a need to encourage greater take-up of BS 8878, and I hope our paper will help in describing ways in which such take-up can be measured.
But what of the development of new ways of measuring WCAG conformance? As described in a paper on Involving Users in the Development of a Web Accessibility Tool at a cost of over 2M Euros the EU-funded European Internet Accessibility Observatory Project developed a robot for measuring conformance with WCAG guidelines across a range of government Web sites in the EU. As described on the eGovernment Monitor Web site has released the eAccessibility Checker which builds on the EU-funded project and can be found at http://accessibility.egovmon.no/. However looking at the results of a survey carried out last month across a number of Norwegian Web sites it seems that there of a number of problems which are experienced by over 80% of the Web sites! If such tools report a low-level of conformance can’t we then use this as evidence of the failures of the WAI model rather than, as has been the case in the past, a failure in organisations to be willing to enhance the accessibility of their services?