UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

When Technology (Eventually) Enhances Accessibility

Posted by Brian Kelly on 10 Mar 2011

“You’re Damned If You Do and Damned If You Don’t!”

Should you make use of a technology if you can’t guarantee that it will be accessible to people with disabilities?  Should you, for example, provide access to videos if you can’t provide captions for the videos?

If you have stated that your institution’s Web site will conform fully with WCAG (1.0 or 2.0) guidelines then you won’t be able to host such videos as the WCAG 2.0 guidelines state:

Guideline 1.2 Time-based Media: Provide alternatives for time-based media.

1.2.2 Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)

Of course failing to provide videos may in itself act as a barrier to people with disabilities: as Lorenzo Dow put itYou’re damned if you do and damned if you don’t!

At the recent JISC CETIS Accessibility SIG meeting which I mentioned recently Shadi Abou-Zahra commented that he felt that some of he criticisms I had made of the difficulties of implementing WCAG guidelines were inappropriate as WAI do not address the policy issues regarding  implementation of the guidelines  – they simply point out that a failure to implement guidelines can result in problems for people with various disabilities.  I have to admit that I wish WAI had been much more vocal in making this point since many public sector organisations (including the UK Government) have stated (or, indeed, mandated) conformance with WCAG guidelines without giving any caveats.

But let’s acknowledge that although there may have been communications problems in the past we are now in a position to exploit WCAG and other guidelines in a pragmatic and achievable way, with the BS 8878 Code of Practice now providing the policy framework to guide us.

The Challenge of Providing Access to Videos

What can be done if you wish to host videos and feel it is not feasible to provide captions?  This may be because ownership of the videos is devolved – perhaps large numbers of students have taken videos of their graduation ceremony and these are being hosted (or linked to) from the institution. Or perhaps, as has been the case at a number of events for developers, researchers and practitioners  video interviews were made with participants and speakers in order to provide potential attendees with an authentic perspective on what to expect at the event and the costs of just-in-case captioning can’t be justified?

The BS 8878 Code of Practice recognises that accessibility is not always easy – or indeed possible – to implement. The important thing to do, therefore, is to document policies and processes.  But in addition there is a need to understand that technological developments may help to address accessibility issues, so that resources which are not accessible today could be made accessible tomorrow but only if those resources are available.

An example of this is the iTitle Twitter captioning service which enabled a Twitter stream to be synchronised with a video stream on popular video-streaming services such as YouTube or Vimeo.

YouTube provides another example of how technological developments may enhance the accessibility of video clips.  Back in November 2009 YouTube announced that they had added a feature that generates video captions:

We’ve combined Google’s automatic speech recognition (ASR) technology with the YouTube caption system to offer automatic captions, or auto-caps for short. Auto-caps use the same voice recognition algorithms in Google Voice to automatically generate captions for video.

Initially this feature only worked for English and was  “enabled for a small number of channels that usually feature talks and interviews: UC Berkeley, Stanford, MIT, Yale, UCLA, Duke,UCTV, Columbia, PBS, National Geographic“. However in March 2010 a CNET News article announcedYouTube brings auto-captioning to everyone“:

Video providers are now able to apply for machine transcription on their own videos. And for videos that have not yet been transcribed, a user can request it themselves. YouTube then puts it in a transcription queue, which can take anywhere from an hour to a day–a time Google is trying to make as fast as possible.

An article in The Register does point out some limitations in th automated transcriptions: “Automatic captions for a 14-year-old’s video diary: nigh incomprehensible” but then goes on to add “US President Obama’s weekly address to the nation: works pretty nice“.

But what are my experiences?  Do I sound like a 14 year old or President Obama?  Generating the automated captions was trivial and, as can be seen in the image below, the system could understand that I was speaking English.  But what has been transcribed as “acceptable snow” was actually me saying “it’s a cancerous cell“!

We therefore can’t say that YouTube’s automatic captions have solved the problem.  But  it strikes me that the quality of the captioning is likely to improve as algorithms improve, additional processing power is provided and, perhaps most importantly, the system begins to recognise regional accents and also individual speaking patterns.

It should also be noted that, as described on the YouTube Web site, the automated captioning service creates a captions.sbv file containing the captions and the time stamp. As this is a text file it can be edited using a simple text editor so that if, for example, much of the captioning is correct but the odd word has been transcribed incorrectly it would be possible to use the automated conversion for the bulk of the conversion work.

Should we not, therefore, be providing YouTube with a wider range of videos containing our various regional accents in order to enhance the automated analyses?  And will a failure to upload our videos result in a failure to enhance accessibility for tomorrow’s audiences?

And if we have lecturers who speak with a clear and distinct English accent (unlike my Scouse accent with traces of the years spend in Yorkshire, Newcastle and the East Midlands) and videos of their talks are successfully captioned, wouldn’t if be unreasonable to fail to provide this service? Let’s remember that UK legislation expects organisations to take reasonable measures – isn’t uploading videos in order to enhance access a reasonable thing for organisations to be doing now?

5 Responses to “When Technology (Eventually) Enhances Accessibility”

  1. Steve Bentley said

    What we perhaps need is a smarter embedable FLV player that can switch between different timed text tracks, so between iTitle and transcript style captions. I’m not sure that a stream of tweets commenting upon what has been said is a sufficient replacement for a transcript.

  2. Just a plug for Synote

    Synote nearly two years ahead of Google
    Synote has been using IBM speech recognition technologies for automatic captioning of videos

    Accessible Prime Minister’s New Year Podcast on Synote

    The world’s first seminar using Twitter for creating synchronised notes

    Martin

  3. pjb1972 said

    Regardless of accent, automatic captioning shouldn’t be regarded as any more than 90% accurate – I’d say that at the moment it is too problematic to be relied apon without editorial input, but with the rate of progress things could be very different in a year or two. Whether NLP will ever comprehend a Geordie accent is debateable ;-)

    Looking at things from a wider perspective, if there is non-captioned video, not transcripts available due to lack of resource, surely it’s a reasonable adjustment to offer an on demand transcript if it is requested – making the availability of that service clear on the web site in question of course.

    • I like the idea of on-demand solutions. For some time I have felt that the “just-in-case” approach to accessibility is not a scalable solution and instead we should identify those areas in which providing rich accessibility may be too costly to justify and a “just-in-time” approach may be an appropriate response.

  4. Yoast said

    The free machine-transcriptions on youtube are not perfect, but they can save whole lot of time if you have to transcribe.
    (If you upload a video you can go to “captions” and then “request machine transcription”, this will result in an SBV file, there are several free websites that can convert SBV to SRT format, for example http://www.dcmp.org/ciy/converting-youtube-to-srt.html ).

    Advantages: decent transcriptions with the time-codes correct, so they can be edited in a free srt-subtite editor. One enormous advantage is that most packages (Natural Speaking) require the software to be “trained” with samples of the speaker’s voice, and cannot process well if there are more voices involved. The youtube transcriptions will accept voices and may hav higher error rates, but they do transcribe them.

    Disadvantages: For longer tracks than 15 minutes you need to either split the file or get an account that allows longer clips. The transcriptions of well-trained specilised software can have lower error-rates. Privacy (Mark the video private when uploaded, delete after transcribing)

Leave a comment