There has been some discussion on the JISC-Repositories JISCMail list (under the confusing subject line of “PLoS business models, global village”) on the issue of file formats for depositing scholarly papers. Some people (including myself) feel that open formats such as XHTML should be the preferred format; others feel that the effort required in creating XHTML can be a barrier to populating digital repositories, and that use of PDF can provide a simple low-effort solution, especially if authors are expected to take responsibility for uploading their papers to an institutional repository.
An issue I raised was the accessibility of resources in digital repositories. There are well established guidelines developed by WAI which can help to ensure that HTML content can be accessible to people with disabilities. Myself and others have argued that the guidelines and the WAI model is flawed, but many of the guidelines are helpful and institutions should seek to implement them (indeed there are legal requirements to ensure that services do not discriminate against people with disabilities).
WCAG 1 has the following requirements:
3.2 Create documents that validate to published formal grammars. [Priority 2]
11.1 Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported. [Priority 2]
11.4 If, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses W3C technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page. [Priority 1].
This seems to be pretty unfriendly towards PDFs, I would argue. WCAG 2.0 (which is in draft form) is, however, neutral regarding file formats – a development I welcome (although the guidelines still have their limitations). However the guidelines still require that content is accessible; and as well as the requirement in the guidelines, there are also legal and ethical requirements to address such issues.
Proprietary formats such as PDF can be made accessible. However I am uncertain as to how alternative text for images and providing structure to PDF documents will happen in a distributed workflow environment.
Rather than dwelling on this (technical) issue, I would like to focus on the policy issues, which should be independent of particular file formats. UK legislation requirements organisations to take reasonable measures to ensure that people with disabilities are not discriminated against unfairly. One could argue that it would be unreasonable to expect hundreds in not thousands of legacy resources to have accessibility metadata and document structures applied to them, if this could be demonstrated to be an expensive exercise of only very limited potential benefit. However if we seek to explore what may be regarded as ‘unreasonable’ we then need to define ‘reasonable’ actions which institutions providing institutional repositories would be expected to take.
One approach would be for the institution to ensure that it provides appropriate training and staff development for authors who are expected to upload documents to repositories. Linked to this may be tools which can flag problem areas to the authors, as documents are being prepared for uploading. There may then be auditing tools which can alert institutions to potential problems.
Related to policies to support the authors, are policies which address specific problems which users with disabilities may have. Clearly many scientific papers (containing formulae, for example) may be difficult to be processed by traditional assistive technologies. Perhaps this is where there is a need for just-in-time accessibility (as opposed to the traditional just-in case approach) or blended accessibility (real world alternatives to digital accessibility barriers).