UK Web Focus

Reflections on the Web and Web 2.0

Archive for the ‘Repositories’ Category

Scridb Seems to be Successful in Enhancing Access to Papers

Posted by Brian Kelly (UK Web Focus) on 10 January 2011

I first wrote about the Scribd document repository service back in March 2007 in a post entitled “Scribd – Doing For Documents What Slideshare Does For Presentations“. Since then I have uploaded a number of papers to the service.  But almost three years on, how has the service developed?

My original post summarised some of the benefits of the service but highlighted a number of concerns:

Has Scribd raised the bar in users’ expectations for digital repositories? In some respects, I feel it has. However there are concerns which need to be recognised:

  • Poor quality resources which are hosted: there is no guarantee of the quality of the resources which are hosted on Scribd. And there are copyrighted publications (including those from O’Reilly) which have already been uploaded.
  • Sustainability of the service: As will all of these type of services, there is the question as to whether such services are sustainable. Techcrunch reported on 6 March 2007 that the service “is coming out of private beta this morning with a fresh Angel investment of $300K on top of their original Y Combinator nest egg of $12,000.“This may keep the service running for a short time, but will it be around in the medium to long term? And what will happen if copyright holders, such as O’Reilly, take the service to court for their misuse of their copyrighted resources (as Viacomm have recently done to YouTube).
  • Lack of a interoperable resource discovery architecture: The approach taken by Scribd is not interoperable with the approach being taken by the JISC development community, which is looking to support the development of distributed interoperable digital repository services which make use of OAI-PMH.

Three years later the service is still available.  And looking at the statistics for access to documents I uploaded to the service, it also seems very popular:  during 2010 there were no fewer than 11,729 views of the 15 papers I uploaded to the service, an average of 32 per day.  As you can see from the graph below there were two significant peaks in the year, when there were over 800 in a day.  If I remove these outliers by viewing the statistics for the last six months of the year I find 4,215 views in the six month period, giving an average of  24 per day.

In comparison looking at the usage statistics for my 26 papers hosted in the University of Bath Opus repository I find that there have been 2,505 views during 2010.

Hmm, the repository has almost twice as many papers and resources in the repository are linked to from the UKOLN Web site and  from posts on this blog.  The repository also benefits from being part of a larger repository ecology, with access available from services such as OpenDOAR and MIMAS’s Institutional Repository Search.  And yet the Scribd service seems to get significantly more visits.

Looking at a specific instance, my most recent paper, “Moving From Personal to Organisational Use of the Social Web“, was presented at the Online Information 2010 at the end of November. This paper was uploaded to the University of Bath repository and was mentioned in a blog post on “Availability of Paper on “Moving From Personal to Organisational Use of the Social Web”” which linked to the copy in the repository.    The paper was also uploaded to Scribd – and this was also mentioned in the blog post (and was, indeed, embedded in the post). The usage statistics to date (10 January 2011) are 53 views in the University of Bath repository and 447 views on Scribd.

Scribd also provides a  easy-to-use interface for viewing usage statistics for individual papers. As can be see from the image, there was a peak (of 181 views) on the day the blog post was published with a smaller peak (102 views)  three days previously.  The total number of views from embedded reads (i.e. people who read the blog post and may – or may not -have actually read the embedded paper) is 349. This leaves 160 views of the paper within the Scribd environment – over three times as many views as received for the copy in the institutional repository.

Whilst I can’t help but think that the usage statistics are flawed, I don’t have any evidence of this. I would appreciate suggestions why the views seem so large. But I also suspect that there will be views from people who were searching for information provided in the papers – and if only 10% of the views came from satisfied users that would be on par with those viewing the larger number of papers in the institutional repository (which is also likely, of course, to be inflated by readers using  Google to view papers which aren’t of interest).

Now Scribd does seem to host, how shall I put it, a wide variety of types of documents, not all of which are of relevance to researchers. But the service does have a variety of features which can help to enhance access to documents such as links to Social Web services such as Twitter and Facebook for promoting documents of interest to one’s professional network and the ability for documents to be embedded in other Web sites.

So if one wishes to maximise the impact of one’s ideas will the institutional repository or a commercial service such as Scribd provide the best solution? Or perhaps one should use both approaches?  And if you feel that researchers will prefer to use a more research-friendly environment than is provided by Scridb, remember than researchers, like everyone else, use Google, which will also find resources of dubious scholarly relevance for searches.

Posted in Repositories, Web2.0 | Tagged: | 4 Comments »

Is It Too Late To Exploit RSS In Repositories?

Posted by Brian Kelly (UK Web Focus) on 22 December 2010

A few years ago we had discussions about ways in which information about UKOLN peer-reviewed papers could be more effectively presented. We asked “Could we provide a timeline view? Or how about a Wordle display which illustrates the variety of subject areas researchers at UKOLN are engaged in?” The answer was yes we could, but it wouldn’t be sensible to carry out development work ourselves. Rather we should ensure that our publications were made available in Opus, the University of Bath’s institutional repository.  And since repositories are based on open standards we would be able to reuse the metadata about our publications in various ways.

We now have a UKOLN entry in Opus and there’s also an RSS feed for the items. And similarly we can see entries for individuals, such as myself, and have an RSS feed for individual authors.

Unfortunately the RSS feed is limited to the last ten deposited items rather than returning the 223 UKOLN items for UKOLN or 45 items belonging to me. The RSS feed is failing to live up to its expectations and isn’t much use :-(

The Leicester Research Archive (LRA), in contrast, does seem to provide comprehensive set of data available as RSS. So, for example, if I go to the Department of Computer Science’s page in the repository there is, at the bottom right of the page (though, sadly, not available as an auto-discoverable link) an RSS feed – and this includes all 50 items.

Sadly when I tried to process this feed, in Wordle, Dipity and Yahoo! Pipes, I had no joy, with the feed being rejected by all three applications. I did wonder if the feed might be invalid, but the W3C RSS validator and the RSS Advisory Board’s RSS Validator only gave warnings. These warning might indicate the problem, as the RSS feed did contain XML elements, such as which might not be expected in an RSS feed.

But whilst my experiment to demonstrate how widely available applications which process RSS feeds could possibly be used to enrich the outputs from an institutional repository  has been unsuccessful to date, I still feel that we should be encouraging developers of institutional repository software to allow full RSS feeds to be processed by popular services which consume RSS.

I have heard arguments that providing full RSS feeds might cause performance problems – but is that necessarily the case? I’ve also heard it suggested that we should be using ‘proper’ repository standards, meaning OAI-PMP – but as Nick Sheppard has recently pointed out on the  UKCORR blog:

I have for some time been a little nonplussed by our collective, continued obsession with the woefully under-used OAI-PMH. Other than OAIster (an international service), the only services I’m currently aware of in the UK are the former Intute demo now maintained by Mimas.

In his post Nick goes on to ask “Perhaps OAI-PMH has had it’s day“.  It’s unfortunate, I feel, that RSS does not seem to have been given the opportunity to see how it can be used to provide value-added services to institutional repositories.  Is it too late?

Posted in Repositories, rss | 8 Comments »

Availability of Paper on “Moving From Personal to Organisational Use of the Social Web”

Posted by Brian Kelly (UK Web Focus) on 29 November 2010

I will present a paper on “Moving From Personal to Organisational Use of the Social Web” at the Online Information 2010 conference tomorrow as well as, as described previously, via a pre-recorded video at the Scholarly Communication Landscape: Opportunities and Challenges symposium.

The eight page paper will be included in the conference proceedings and can also be purchased for a sum of £135! However my paper is available (for free!) from the University of Bath Opus Repository. In addition, in order to both enhance access routes to the paper (and the ideas it contains) and to explore the potential of a Web 2.0 repository service, the document has also been uploaded to the Scribd service.

From the University of Bath repository users can access various formats of the paper and a static and persistent URI is provided for the resource.   But what does Scribd provide?

Some answers to this question can be seen from the screen shot shown below.  Two facilities which I’d like to mention are the ability to can:

  • Let others know about papers being read in Scribd using the Readcast option which will send a notification to services such as Twitter and Facebook.
  • Embed the content in third party Web pages.

In addition the Scribd URI seems likely to be persistent: http://www.scribd.com/doc/43280157/Moving-From-Personal-to-Organisational-Use-of-the-Social-Web

I had not expected the WordPress.com service to allow Scribd documents to be embedded but, as can be seen below, this is possible.

There are problems with Scribd, however.  It’s list of categories for uploaded resources is somewhat idiosyncratic (e.g. Comics, Letters to our leaders, Brochures/Catalogs). There is also a lot of content from UKOLN, my host organisation, which has been uploaded without our approval.  But in terms of the functionality and ways in which the content can be reused in other environments it has some appeal.  If only these benefits could be integrated with the more managed environment for content and metadata provided by institutional repositories.  But should that be provided by institutional repositories embedded Web 2.0 style functionality or, alternatively, by Web 2.0 repositories services adding on additional management capabilities?

Posted in Repositories, Web2.0 | Tagged: | 4 Comments »

EPub Format For Papers in Repositories

Posted by Brian Kelly (UK Web Focus) on 4 August 2010

EPub as a Format for Use in Institutional Repositories?

In a post entitled “File Formats For Papers In Your Institutional Repository” I suggested that depositing a HTML version of a paper might have various advantages over the PDF format which is the norm. But in light of the growing importance of mobile devices wouldn’t it seem appropriate to make such papers available in the EPub format?

EPub is described in Wikipedia as “a free and open e-book standard by the International Digital Publishing Forum (IDPF)“. The article goes on to add that “EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.

In terms of the open standards used EPub consists of three specifications:

  • Open Publication Structure (OPS) 2.0, contains the formatting of its content.
  • Open Packaging Format (OPF) 2.0, describes the structure of the .epub file in XML.
  • OEBPS Container Format (OCF) 1.0, collects all files as a ZIP archive.

The articles states that “EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.

Using the EPub Format

Paper in EPub format, showing imagePaper in EPub format showing page-turningThis sounds interesting so I converted the HTML version of my recent paper on “Empowering users and their institutions: A risks and opportunities framework for exploiting the potential of the social web” into EPub format and added it to my library of ebooks on my iPod Touch using the Stanza application.

The accompanying images show how the paper is displayed. The first image illustrates the page turning style of navigation provided using EPub and the second image illustrates an embedded image.

The paper is also available from Opus, the University of Bath’s institutional repository service. I should mention that the URL for the EPub file is http://opus.bath.ac.uk/17484/5/i4.epub. I discovered that entering the URL into a browser on my iPod Touch allowed me to view the document in the Stanza application. On a normal PC users will probably not have a viewer set up to render this format, which may cause some confusion.

As might be expected for a format which uses XHTML the conversion from the XHTML original was a simple operation. I should add that I also experimented with converting a PDF version of the paper to EPub but this resulted in various problems due, I think, to the way in which the two-columns used in the paper were linearised.

Revisiting the Issue of Formats for Use in Repositories

This initial experiment seemed to show that creating an EPub version of a paper in a repository can be done quite easily. However the ease of doing this may have been due to the availability of a HTML version of a paper; doing this on a large-scale may be time-consuming if HTML formats of papers are not available.

Let’s revisit the question of what formats for papers should we be seeking to deposit in institutional repositories?

From a preservation perspective the advice from archivists tends to be that you should preserve the original master copy. In many cases this is likely to be MS Word, although other popular formats will probably include Open Office and LaTeX.

From an interoperability perspective an open standard is preferable. I would suggest that rather than making use of a specific DTD designed for scholarly publishing we should use a well-established and popular existing open format – HTML (in whatever version).

If we wish to maximise the take-up of our repositories whilst minimising the effort in processing the files it seems to me that we should explore ways of creating derivative versions from the master source. So rather than uploading a PDF shouldn’t we be uploading the master file and creating a PDF automatically form this resource? And rather than creating an EPub file, as I have done, shouldn’t the repository software create the EPub file from a HTML version of the file? And whilst I acknowledge that authors may not wish to make their original document (in, say MS Word or Open Office format) available to others and would regard the interoperability aspects of PDF as a feature rather than a flaw there should be nothing to stop the master file being stored in the repository but not openly accessible.

Is anyone thinking along these lines?


Twitter conversation from Topsy: [View]

Posted in Repositories | 22 Comments »

Automated Accessibility Analysis of PDFs in Repositories

Posted by Brian Kelly (UK Web Focus) on 30 July 2010

Back in December 2006 I wrote a post on Accessibility and Institutional Repositories in which I suggested that it might be “unreasonable to expect hundreds in not thousands of legacy [PDF] resources to have accessibility metadata and document structures applied to them, if this could be demonstrated to be an expensive exercise of only very limited potential benefit“. I went on to suggest that there is a need to “explore what may be regarded as ‘unreasonable’ we then need to define ‘reasonable’ actions which institutions providing institutional repositories would be expected to take“.

A discussion on the costs and complexities of implementing various best practices for depositing resources in repositories continued as I described in a post on Institutional Repositories and the Costs Of Doing It Right in September 2008, with Les Carr suggesting that “If accessibility is currently out of reach for journal articles, then it is another potential hindrance for OA“. Les was arguing that the costs of providing accessibility resources in institutional repositories is too great and can act as a barrier to maximising open access to institutional research activities.

I agreed with this view, but also felt there was a need to gain evidence on possible accessibility barriers. Such evidence should help to inform practice, user education and policies. These ideas were developed in a paper published last year on “From Web Accessibility to Web Adaptability” (available in PDF and HTML formats) in which I suggested that institutions should “run automated audits on the content of [PDF resources in] the repositories. Such audits can produce valuable metadata with respect to resources and resource components and, for example, evaluate the level of use of best practices, such as the provision of structured headings, tagged images, tagged languages, conformance with the PDF standard, etc. Such evidence could be valuable in identifying problems which may need to be addressed in training or in fixing broken workflow processes.”

I discussed these ideas with my colleagues Emma Tonkin and Andy Hewson who are working on the JISC-funded FixRep project which “aims to examine existing techniques and implementations for automated formal metadata extraction, within the framework of existing toolsets and services provided by the JISC Information Environment and elsewhere“. Since this project is analysing the metadata for repository items including “title, author and resource creation date, temporal and geographical metadata, file format, extension and compatibility information, image captions and so forth” it occurred to me that this work could also include automated analyses of the accessibility aspects of PDF resources in repositories.

Emma and Andy have developed such software which they have used to analyse records in the University of Bath Opus repository.  Their initial findings were published in a paper on “Supporting PDF accessibility evaluation: Early results from the FixRep project“. This paper was accepted by the “2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010)” which was held in Greece on 25-28 May 2010. Due to the volcanic ash Emma and Andy were unable to attend the conference. Emma did, however, produce a Slidecast of the presentation which she used as she wasn’t able to physically attend the conference. This has the advantage of being able to be embedded in this blog:

The prototype software they developed was used to analyse PDF resources by extracting information about the document in a number of ways including header and formatting analysis; information from the body of the document and information from the originating filesystem.  The initial pilot analyse PDFs held in the University of Bath repository and was successful in analysing 80% of the PDFs,with 20% being unable to be analysed due to a lack of metadata available for extraction of the file format of file was not supported by the analysis tools.

In my discussions with Emma and Andy we discussed how knowledge of the tools used to create the PDF would be useful in understanding the origins of possible accessibility limitations, with such knowledge being used to inform both user education and the workflow processes used to create PDFs which are deposited in repositories. However rather than the diversity of PDF tools which were expected to be found, there appeared to be only two main tools used. It appears that this reflects the software used to create the PDF cover page (which I have written about recently) rather than the tools used to create the main PDF resource. If you are unfamiliar with such cover pages one is illustrated – the page aims to provide key information about the paper and also provides institutional branding, as can be seen.

As Emma concluded in the presentation “We may be ‘shooting ourselves in the foot’ with additions like after-the-fact cover sheets. This may remove original metadata that could have been utilised for machine learning.

Absolutely! As well as acting as a barrier to Search Engine Optimisation (which is discussed in the paper)  the current approaches taken to the production of such cover pages act as a barrier to research, such as the analysis of the accessibility of such resources.

It does strike me that this is nothing new. When the Web first came to the attention of University marketing departments there was a tendency to put large logos on the home page, images of the vice-chancellor and even splash screens to provide even more marketing, despite Web professions pointing out the dangers associated with such approaches.

So whilst I understand that there may be a need for cover pages, can they be produced in a more sophisticated fashion so that they are friendly to those who are developing new and better ways of accessing resources in institutional repositories? Please!

Posted in Accessibility, Repositories | 7 Comments »

File Formats For Papers In Your Institutional Repository

Posted by Brian Kelly (UK Web Focus) on 7 July 2010

File Formats I Have Used to Deposit Items in the Bath Institutional Repository

What file formats should you use to deposit papers in your institutional repository?  Although I recently suggested that RSS could have a role to play in allowing the contents of a repository to be syndicated in other environments  that post didn’t address the question of the preferred file format(s) for mainstream resources such as peer-reviewed papers.

For my papers in the University of Bath Opus repository I initially normally deposited the original MS Word and the PDF version which is normally submitted to the journal or conference: the MS Word file is the original source material which is needed for preservation purposes and the PDF file is the open standard version which should be more resilient to software changes than the MS Word format.

What I hadn’t done, though, was to deposit a HTML version of my papers, despite that fact that I normally create such files.  I think I suspected that uploading HTML files into a repository might be somewhat complicated so when I uploaded my papers I omitted the HTML versions of the papers.

Problems With PDFs

PDF cover page for a paper in the Opus repositoryHowever when I recently viewed the repository copy of the PDF version of my paper on “Library 2.0: Balancing the Risks and Benefits to Maximise the Dividends” I discovered that such papers have a cover page appended as shown.

Having recently being a co-facilitator on a series of workshop on “Maximising the Effectiveness of Your Online Resources” I am well aware of best practices to help ensure that valuable resources can be easily discovered by search engines. And although papers in the repository do have a ‘cool URI’ prefixing the content of all papers in the repository with the same words (“University of Bath Open Online Publications Store” followed by “http://opus.bath.ac.uk/” and “This version is made available in accordance with publisher policies. Please cite only the published version using the citation below.” goes against best practices for Search Engine Optimisation.

The cover page isn’t the only concern I have with use of PDFs in institutional repositories.  Despite PDF being an ISO standard not all PDF creation programs will necessarily create PDF which conform with the standard, with papers containing mathematical formula or scientific notation being particularly prone to failing to embed the fonts needed to provide a resources suitable for long-term preservation.  I also suspect that, although it is possible to create accessible PDFs, I suspect that many PDF files stored in repositories will fail to conform with PDF accessibility guidelines.

Providing HTML Versions of Papers

In light of these reservations I have decided to provide a HTML version of my recent papers in the University of Bath institutional repository. So my paper on “From Web Accessibility to Web Adaptability” (for which the publisher’s embargo has recently expired) is available in HTML as well as PDF formats.

As I suspected, however, depositing the HTML version of the paper was slightly tricky.  I uploaded the paper using the Upload for URL option and this initial attempt resulted in the page’s navigational elements are search interface being embedded in the page.  And since the upload mechanism only uploads files which are ‘beneath’ the paper in the underlying directory structure the page’s style sheet was not included.  In short, the page looked a mess.

Since the HTML files I have created contain the contents of the paper separately from the page’s navigational elements it was not too difficult to create a very simple HTML file which I included (with the citation details appended at the end of the paper) in the resource which is available in the repository. As can be seen the contents are available even if the page is not visually appealing.

There are, of course, resource implications in creating HTML versions of papers. However it will be interesting to see if providing content which is more easily found in Google provides benefits in enhancing access to papers which are provided in HTML format  - and since resource discovery is one of the main aims of a repository it might be argued that resources should be provided to ensure that HTML versions of papers are made accessible.

But What About Richer XML Formats?

The purist might argue that whilst HTML is an open and Web-native resource is may not be rich enough for use with peer-reviewed papers. I have some sympathies which such views. Anthony Leonard has described how we should go about ”Fixing academic literature with HTML5 and the semantic web“. I would agree that there’s a need to explore how HTML5 can be used in the context of institutional repositories.

But mightn’t there be another XML format we should consider? How about an open format which is widely supported and deployed and which, for many authors, will not require any changes to their authoring environment? The format is OOXML – an ECMA standard which has also been standardised as an International Standard (ISO/IEC 29500). However not all open standards are equally open and as this standard is based on Microsoft’s format for their office applications, as Wikipedia describes “the ISO standardization of Office Open XML was controversial and embittered“.

In light of this discussion, what format(s) would you recommend for use with institutional repositories?

Posted in Repositories | 12 Comments »

Getting Into The Top Ten For Your Institutional Repository

Posted by Brian Kelly (UK Web Focus) on 10 June 2010

Statistics on Downloads for the University of Bath Institutional Repository

The University of Bath is currently testing the IR Stats package in Opus, the University’s institutional repository. Using the Web interface to the package I ran a search for the top ten downloads over the past year.   The results are shown below -and, as you can see, a paper on “Library 2.0: balancing the risks and benefits to maximise the dividends” by myself, Paul Bevan, Richard Akerman, Jo Alcock and Josie Fraser is in second place!  You’ll have to scroll on beneath the image to discover the secrets of how to ensure that your research paper gets into the top ten for your institutional repository :-)

Top ten downloads from Opus repository in past year

Seeking An Explanation

On 11 August 2009 I wrote a blog post in which I described how my Paper on “Library 2.0: Balancing the Risks and Benefits to Maximise the Dividends” [had been] Published in Program.

Now looking at the blog statistics for visits to the post I discover that there have been a total of 735 views (with 162 on the day of publication ).

Since the blog post linked directly to the details of the paper provided in the institutional repository I believe that many of the visits to the blog post resulted in downloads of the paper in the repository – and so it was a direct result of having a blog and writing a timely post about the paper which resulted in the paper being the second most downloaded paper last year.

Do I have any further evidence to back up this assertion? It would have been interesting to see it a tweet about the post had generated traffic to the article but, having looked at the archive of my tweets in BackUpMyTweets it seems I didn’t use Twitter on the day the post was published. It also seems that a bit.ly URL for the post hadn’t been minted previously, so unfortunately there are no bit.ly statistics to examine.

However looking at the download statistics over the past year for my other items in the repository this particular item stands out for its popularity – and so I will assert that the timely blog post linking to the repository item generated over thirty times the normal annual traffic to one of my papers.

Search engine traffic to my items in the Opus repositoryLooking at the search engine statistics for all of my items over the period I discover than 80% of the traffic is not delivered by a search engine (the red quadrant in the pie chart).

Referrers traffic to my items in the Opus repositoryUsing the display of referring traffic to my items confirms that search engines aren’t significant in providing traffic (20%) and the repository search itself only that only delivers 10% of the traffic. Rather it is external Web sites (i.e. my blog, I believe) which delivers 39% of the traffic with 31% of the traffic having no referred information (I have found this is often traffic from Twitter clients but in this case in may be traffic coming from RSS readers used to view the post).

Discussion

Of course the large number of downloads is no indication of the quality of the paper.  And it might be that the paper was downloaded by an automated agent (perhaps someone was retrieving papers on Library 2.0 and the harvester repeatedly downloaded this paper).  Or, alternatively, maybe the statistics package is producing incorrect results.

But, unless I come across alternative evidence, I will regard the popularity of this item as an indication that blog posts can have a significant impact on the traffic to items in an institutional repository.  Note that I am not saying that blogs are the only significant factor – my UKOLN colleague Alex Ball and Andy Ramsden, head of the e-learning team (both of whom work on the same corridor as me) also figure in the top ten downloads. In their case I think embedding links to their Opus items in external Web sites helps to drive traffic.

However, especially for those working in areas in which there are significant numbers of blog readers, having a blog and using it effectively may provide the researcher with an advantage in raising awareness of their research.

Would you agree?

Posted in Blog, Repositories | 16 Comments »

Video of Dorothea Salo’s Seminar at UKOLN

Posted by Brian Kelly (UK Web Focus) on 22 April 2010

I recently mentioned that Dorothea Salo (better known in some circles as The Repository Rat – which is also her Twitter ID) was visiting UKOLN to give a seminar entitled “Grab a bucket – it’s raining data!“. Dorothea gave a fascinating talk on the importance of the management of scientific data, but tempered with a description of the complexities of this work and the challenges to be faced by whoever (librarians?) should take responsibility for such work.

Dorothea Salo's seminarStaff at UKOLN and visitors from elsewhere at the University of Bath and elsewhere very much enjoyed Dorothea’s talk and the subsequent discussions.  For those who weren’t there we have, with Dorothea’s kind permission, recorded a video of her talk which is available on the Vimeo service (in two parts: part 1 and part 2).

Posted in Repositories | Leave a Comment »

UKOLN Seminar: “Grab a Bucket – It’s Raining Data!”

Posted by Brian Kelly (UK Web Focus) on 15 April 2010

Across the international repository community Dorothea Salo established a reputation for her Caveat Lector blog which ran from 2002–2009.  On her current  The Book of Trogool blog Dorothea now describes herself as “an academic librarian exploring the practices, processes, and praxis of e-research“.

As mentioned in a recent post on her blog entitled “Hello from Scotland!” Dorothea, who works at the University of Wisconsin, is currently in the UK. At the start of the week Dorothea spoke at the UKSG conference in Edinburgh where she gave a plenary talk on “Who Owns Our Data?“.

On Monday morning (19 April 2010) Dorothea will be speaking at a UKOLN seminar which will be held at the University of Bath.  The title of the seminar is “Grab a bucket – it’s raining data!” and the abstract is given below:

From a distance, the coming-together of libraries and research data looks like a match made in heaven. Libraries need the attention and support of scientists, and libraries offer digital services and portals that should accommodate the preservation and dissemination needs of data.

When we look a little closer, however, we find a lot of impedance mismatches between what data need and what libraries have on offer. This talk will explore those mismatches and suggest ways forward.

The seminar will take place from 09.30-12.00 in the Library seminar room 3E 3.8 on the University of Bath campus.  If you would like to attend please sign up on the Eventbrite booking form.

Posted in Events, Repositories | 1 Comment »

Talk at Edspace Event, University of Southampton

Posted by Brian Kelly (UK Web Focus) on 3 November 2009

I have been invited by the JISC-funded Edspace project, based at the University of Southampton to give a talk at an event on “Traditional educational repositories v. Web 2.0 resource sharing” to be held on Wednesday 4 November 2009. I have been asked speak on “the future for educational resources and services on the Web” – a rather grandiose topic, I think! I’ve entitled the talk “The Future for Educational Resource Repositories and Services in a Web 2.0 World” as its the Web 2.0 aspect I feel is important (and reflects my area of expertise – I don’t claim to have anything particularly significant to say on the repository side of things).

I’ll be saying that many of the technical aspects of Web 2.0 are now mainstream – and indeed the Edspace’s Edshare service provides RSS feeds, tag clouds, embed functionality and ‘cool URIs’.

But the term Web 2.0 also  covers the network as the platform and a culture of openness. The issue of openness of educational resources is being addressed in, for example, the JISC OER programme and although I personally seek to ensure that my content (such as blog posts, slides and papers) are available under a Creative Commons licence I know that there are added complexities in the area of educational resources – so I’ll not focus on the openness issue.

Instead I’ll raise the question of the network as the platform in the context of the futures for educational resource repositories.  I’ll suggest that as experts predict further cuts in the public sector, including higher education, wouldn’t it be appropriate for our repository services to be hosted in the cloud?  And the concerns which tend to be raised (sustainability, reliability, legal issues, etc.) are implementation details which do need to be addressed – but these aren’t the important policy issues.

The slides I’ll be using are available on Slideshare (in the Cloud(!) although a master copy is also held locally) and is embedded below.

Posted in Events, Repositories | Leave a Comment »

Depositing My Paper Into the University of Bath Institutional Repository

Posted by Brian Kelly (UK Web Focus) on 21 July 2009

I recently mentioned that my paper on “From Web accessibility to Web adaptability” had been published in a special issue of the Disability and Rehabilitation: Assistive Technology journal. Shortly after receiving the notification that the paper had been published I deposited the author’s version of the paper in Opus, the University of Bath Institutional Repository. As I had attended a short training course on use of Opus (which uses the ePrints repository software) a few hours before uploading the paper to the repository I decided to time how long it took to complete the process.

I discovered it took me 16 minutes to do this. As someone responded to my tweet about this, this seemed too long.  I subsequently discovered that I had mistakenly chosen the New Item option – as a DOI for the paper was available I should have selected the Import Items option (not an intuitive name, I feel). In addition I also copied the list of 46 references and tried to apply some simple formatting (line breaks between items) to the list and also to the abstract. This was a mistake, as any line breaks appear to be ignored.

In order to understand what I should have done, I went through the deposit process a second time and this time recorded my actions, with an accompanying commentary as a screencast which is available on YouTube and embedded below.

The video lasts for 10 minutes and the deposit process took 7 minutes (although this includes the time taken in giving the commentary and showing what I did the first time).

It does occur to me that it might be useful to make greater use of screencasting not only as a training aid for institutional repository staff to demonstrate the correct processes for depositing items but also to allow authors themselves to show and describe the approaches they take. I’m sure that some of the mistakes I made are due to limitations of the user interface and I won’t be alone in making such mistakes. Indeed having shown this view to the University of Bath’s institutional repository manager she commented:

I’ve also noticed, from your video a few issues that should be fixed, so it was helpful to see.

Why aren’t we making more screencasts available of user interactions with the services we develop, I wonder? And why aren’t we sharing them?


Note: Just to clarify, this post was intended encourage users to described (openly) their experiences in using services such as repositories. and to share these experiences. The video clip is not intended as a training resource on how to deposit an item in a repository! [24 July 2009]

Posted in Repositories | 13 Comments »

The Launch of OPuS

Posted by Brian Kelly (UK Web Focus) on 4 February 2009

The University of Bath’s OPuS service, the online archive for University of Bath research publications, was launched yesterday (3rd February 2009) by Professor Jane Millar, the University’s Pro-Vice Chancellor (Research).

OPuS (which, incidentally, stands for ‘Online Publications Store’) currently holds over 12,000 references including journal articles, books and book sections, conference items, patents, reports and working papers, and research degree theses. Some of these items, including the theses are available in full-text. The aim of the service is to help strengthen the promotion and preservation of research outputs.

I recorded (with permission) Professor Jane Millar’s official launch of the service and this clip (which is also available on YouTube) is embedded below:

I should also add that the introduction to the launch was given by University Librarian, Howard Nicholson (YouTube video clip available) and Kara Jones, the university’s Research Publications Librarian, concluded the event by providing some facts and figures about the service and the role that she can play in supporting departmental use of the service (YouTube video clip available).

Many thanks to Kara Jones for organising this launch event and ensuring that a large number of the University’s research publications were uploaded to the service prior to the launch. Readers with particular interests in repositories may wish to add Kara’s My:self Archive blog to their RSS reader.

Posted in Repositories | Leave a Comment »

Institutional Repositories and the Costs Of Doing It Right

Posted by Brian Kelly (UK Web Focus) on 29 September 2008

There’s an interesting discussion taking place on the JISC-Repositories JISCMail list, following a post from Jenny Delasalle who asked:

Do any of you know how long it takes you to process a single item, before it is available as a live record in your repository? Please can you share that information with the list? 

Jenny provided details of her experiences:

Here at Warwick it takes at least 2 hours to process a single item. We are adding to our repository at a rate of about 15 items per week. I’m desperate to try to speed this up as we are receiving items faster than we can process them.

My colleague Pete Cliff somewhat tentatively suggestedwhy not put the items in the repository with minimal metadata“.

Pete and others seemed to feel that such compromises may be needed “in the current climate where quantity seems to have more impact than quality“. But this is where I would disagree.  This argument seems to be simply a cry for more resources in an area of interest to those making such a plea. But people will always be asking for more resources for their areas of interest – and, as there will always be limited resources, others will argue that their areas are more worthy of being allocated more resources.  And it strikes me as being somewhat disingenuous to have developed an approach which is known to be resource-intensive and then to make a plea for additional resources in order for the particular approach to be effective. A more honest approach would have been to develop a solution which was better suited for the available resources.

This was an argument I made last week in my talk on “Web Accessibility 3.0: Learning From The Past, Planning For The Future“. As I described in my talk (and note a 30 minute video of the talk is available). I pointed out that evidence suggests that Web accessibility policies based on conformance with WCAG AA have clearly failed, except in a small number of cases. And rather than calling for additional resources to be allocated to changing this we need to acknowledge that this won’t happen, and to explore alternative approaches.

And it is interesting to note that apprarent lack of interest on the JISC-Repositiories list in discussing the accessibility of resources in the repositories rather than the metadata requirements for aiding resource discover. Indeed when this topic was discussed a couple of year’s ago Les Carr, with a openness which I appreciated, argued that:

If accessibility is currently out of reach for journal articles, then it is another potential hindrance for OA. I think that if you go for OA first (get the literature online, change researchers’ working practices and expectations so that maximum dissemination is the normal state of affairs) THEN people will find they have a good reason to start to adapt their information dissemination behaviours towards better accessibility.

Here Les is arguing that the costs of providing accessibility resources in Institutional Repositories is too great, and can act as a barrier to maximising open access to institutional research activities. I would very much agree with Les that we need to argue priorities – as opposed to simply asking that someone (our institutions, the government – it’s never clear who) should give us more money to do the many good things we would like to do in our institutions.  

In the case of Institutional Repositories we then have competing pressures for resources for metadata creation and management and for enhancing the accessibility of the resources. In this context It should be noted that the WCAG 2.0 guidelines have reached the status of Candidate Recommendation, and that WAI Web site states quite clearlyWe encourage you to start using WCAG 2.0 now“. And note that, unlike the WCAG 1.0 guidelines, WCAG 2.0 is format neutral. So you can provide resources on your Web site in a variety of formats, but such resources need to conform with the guidelines if it is your institutional policy to do so.

So shouldn’t institutions who have made public commitment to comply with WCAG guidelines ensure that this applies to content in their institutional repositories, even if this will require a redeployment of effort from other activities, such as metadata creation?

Or, alternatively, you may feel that complying with a set of rules, such as WCAG, without doing the cost-benefit analysis or exploring other approaches to achieving the intended goals is mis-guided. In which case perhaps Pete’s suggestion that you might wish to consider “put[ting] the items in the repository with minimal metadata” might actually be a sensible approach rather than an unfortunate compromise? And in response to Philip Hunter’s comment that “achieving interoperability through dumbing-down the metadata has a strange attractiveness in a world not overly crazy for quality” perhaps we should be arguing that “achieving interoperability and accessibility through labour-intensive manual efforts is a perverse solution in a public sector environment in which should be demonstrating that we can provide cost effective solutions“?

Posted in Accessibility, Repositories | 3 Comments »

GCSEs Revisited

Posted by Brian Kelly (UK Web Focus) on 21 February 2008

It always pleasing when a blog post achieves its aim, and even more so when this happens so quickly. So it was good to read AJ Cann’s post in which he describes how he spent 3 minutes using the Google Custom Search Engine (GCSE) to provide an alternative to his institutional search engine. As he titled his post “It was all Brian Kelly’s fault“!

Revisiting my original post it would seem that there are a number of ways in which GCSE is being used:

In this latter case, AJ is clearly unhappy with the local search engine service (ht://Dig): “I can’t stand the inadequate institutional search tools I’ve been forced to use for a decade” – and decided it was worth spending “less than 30 seconds” to set up an alternative! And this approach reflects AJ’s interests in Personal Learning Environments (PLEs). He now has a Personal Search Engine.

Now if setting up GSCE across a range of Web sites is so easy and can be done by individuals without the need for institutional commitment. in what other ways could the software be used?

As we’ve recently discussed institutional repositories and various people have aired their concerns on the approaches being taken, it seems to me that the GCSE could have a role to play in providing an alternative way of searching repositories.

And this approach has already been taken on the OpenDOAR Search Repository Contents service and the Search ROAR Content With Google service.

This approach fits in nicely with Rachel Heery’s comment that “I don’t really see that there is conflict between encouraging more content going into institutional repositories and ambitions to provide more Web 2.0 type services on top of aggregated IR content. Surely these things go together?“. We have the managed content in the repository and are providing users with a choice in the selection of a search interface.

It’s good to see that happening. But can’t we do even more. We could, for example, use the two ways of searching for gaining evidence of the preferences users may have for searching. And perhaps rather than exposing new users of repositories to the rich functionality of the repository’s search interface, shouldn’t we acknowledge that many users will prefer the simplicity of a Google search, and provide the GCSE interface as better focussed alternative to the global Google search tool, with the option of pointing the users in the direction of the richer service if they find that this search interface is not good enough.

This approach would have the added advantage of not requiring the expenses associated with in-house software development. Indeed could it not be argued public-sector organisations should have a responsibility to make use of relevant freely-available services, at least in prototyping or providing a service for making comparisons even if it isn’t envisaged that the service will be used in a final production role?

Of course the danger may be that the users decide that they are happy with Google. And we wouldn’t want that to happen, would we?

Posted in Repositories | Tagged: | 5 Comments »

Distributed Discussions On Repositories

Posted by Brian Kelly (UK Web Focus) on 19 February 2008

The Repositories Debate

Andy Powell recently wrote a post on the eFoundations blog about his opening plenary talk at the VALA 2008 conference.

His post generated interesting discussions and debate amongst those involved in repository activities in the UK and the wider community. Paul Miller was in agreement with Andy’s comments in his post on the Panlibus blog entitled “Andy Powell is Spot On” with Paul feeling that “Our current approach, fundamentally, is totally, completely, utterly wrong, isn’t it?”.

Over on his blog my colleague Paul Walk has given his thoughts on Andy’s post expressing agreement in several areas but disagreeing with Andy’s view that “we need to focus on building and/or using global scholarly social networks based on global repository services“. Paul (W) responds by asking “Why can’t we “focus on building and/or using global scholarly social networks” (which I support) based on institutional repository services? We don’t have a problem with institutional web sites do we? Or institutional library OPACs?”. My former colleague Rachel Heery has responded in a similar vein to Paul in a response to Andy’s post: “I don’t really see that there is conflict between encouraging more content going into institutional repositories and ambitions to provide more Web 2.0 type services on top of aggregated IR content. Surely these things go together?“.

Meanwhile over on his Overdue Ideas blog Owen Stephens gives his thoughts from the perspective of a practitioner involved in setting up the Spir@l institutional repository at Imperial College with a wittily-titled post “R.I.Positories“. Owen concludes “we need is a system that helps us administer the workflow around the delivery of digital objects in a corporate environment, but that is invisible to those not involved in the administration – and that’s what I want out of a ‘repository’ – so, for me, the Repository is dead, long live the repository“.

And a few minutes ago I noticed a pop-up alert informing me of a blog post entitled “RESTful Repositories?“. An intriguing title, I thought, so I viewed the post and came across Stu Weibel’s contribution which suggested that “One way to think about repositories is as the bookshelves of the digital library“. Stu went on to point out that “We don’t ask scholars, having just published an article or book, to ‘go to the library to find the most appropriate place for it… and don’t come back until you do!’“   This sounds reasonable to me – there’s a need for the physical library and the infrastructure that is associated with it, but the researchers don’t need to know how it works. This might be an approach to be taken with institutional repositories – so let’s not scare them off with the ins and outs of the metadata schemas.

Engaging With A Distributed Debate

There’s clearly an interesting debate taking place around the approaches which should be taken to maximising access to the UK’s research papers. But if you have an interest in institutional repositories how do you find out where the debate is taking place and how do you participate?

I have had discussions with colleagues who feel that such debates should be centralised and should use a ubiquitous communications channel – namely email. From this perspective the debate about institutional repositories within the UK higher education community should take place on the JISC-Repositories JISCMail list. However I feel that this will result in the debate being marginalised to those with a particularly strong interest in repositories, will tend to focus on the nitty-gritty details which email tends to encourage and, in the case of JISCMail, the debate will be trapped within the JISCMail Web site, not only because the JISCMail archives are not exposed to search engines such as Google, but also because of the ‘uncool’ URIs for messages in the archive.

And, of course, email discussions fragment, in any case, and I suspect the Australian participants at the VALA 2008 conference will be having their own discussions about repositories on their own mailing lists.

An alternative view is that the debate with take place via scholarly articles published in peer-reviewed journals. This may be the case in many areas of research, but man in the digital library community would be frustrated by the lengthy timescales that process would entail.

Like it or not, the debate is taking place using a variety of communications tools, including the blogosphere.

So, if you wish to engage with such discussions, how do you find out what is happening? In my case my RSS reader (Feedreader) will automatically inform me of new posts for the blogs I’ve subscribed to. This includes the eFoundations blog, although in the case of Andy’s post I was alerted to its publication a couple of hours after it had been published via a tweet on Twitter.

The distributed nature of such debates has benefit, such as allowing the discussions to be brought to the attention of different communities. When doing this, there is an expectation that bloggers will link to the original post. And if blogs allow trackbacks, it will be possible to follow links from an original post to blogs which have commented on it.

Returning to Andy’s original post, Paul Walk noticed that the eFoundation’s blog hadn’t included a trackback to Paul’s post. This is probably a technical glitch – but this incident made me think about the importance of trackbacks in the integration of distributed discussions. Owen Stephen’s R.I.P.ositories post included a link to a post on The importance of being open the eFoundation blog dating back to October 2006. But comments to such old posts are disabled – I assume to minimise the effort in deleting spam comments. But this is breaking the linkages to related discussions. How, then, should we balance the benefits of allowing such tracebacks versus the maintenance costs of managing misuse?  Or do you disagree with blogs being used for this type of discussion and debate?

Posted in Blog, Repositories | 7 Comments »

CRIG Teleconference Chats On ‘Repositories And Other Services’

Posted by Brian Kelly (UK Web Focus) on 6 December 2007

I recently took part in one of a series of teleconference chats organised by the JISC-funded CRIG (Common Repository Interfaces Working Group) project.

The project organised a day of tele-conferences on 8th November 2007. The aim of the day was to facilitate a “discussion between members on how repositories might be improved (bluesky thinking)“. A recording of the discussions is available from the DigRep wiki. In addition, the project team created a series of mindmaps which helped to visualise the topics covered in the seven areas covered during the day.

I took part on the final discussion of the day which looked at other services which may interface with repositories, with a particular focus on the role of externally-hosted Web 2.0 services. The mindmap for this session is shown below.

Mindmap of discussions
(Click for larger display).

The discussions revolved around the in-house development vs. use of Web 2.0 services which are a recurring topic of discussion. I did, however, find that the visualisation of the discussions provided me with the opportunity to revisit these issues from a different perspective. I’ll have to have another look at mindmapping tools, I think.  And reading Mike Ellis’s post on Good web apps: Back of postage stamp… it would seem that MindMeister should be the first tool for me to look at.

Posted in Repositories | Tagged: , | Leave a Comment »

Scribd – Doing For Documents What Slideshare Does For Presentations

Posted by Brian Kelly (UK Web Focus) on 29 March 2007

As I’ve recently described, a couple of months ago I uploaded PDFs of a few of my papers to Slideshare, and wondered whether there was a business opportunity for Slideshare in extending its remit from providing a repository of slideshows to include documents in general.

Well last week I came across Scribd – a Web 2.0 service which provides this functionality, describing itself as “YouTube for documents”. I registered for the service (although, strangely, you don’t need to be registered to upload documents) and uploaded several of my papers. And I have to admit that I’m very impressed with the service. I could upload my papers in several formats (including MS Word, PDF, MS PowerPoint and MS Excel) and, when I uploaded an MS Word document, alternative formats were created, including PDF, HTML, plain text and even an MP3 file which provided a computer-generated sound file for the paper! As well as the accessibility benefits which this may provide, being able to download various formats means that the service cannot be accusing of ‘fake sharing’ – a term coined on the lessig blog and discussed on the O’Reilly Radar and eFoundations blogs.

Scribd Interface

The interface seemed very usable; as well as allowing the paper to be viewed in a variety of formats Scribd, as seems to be the norm for these type of services, allows resources to be bookmarked (‘favourited’ seems to be the word used to describe this), usage statistics are provided and, as with Slideshare, the resource can be embedded in Web pages.

Has Scribd raised the bar in users’ expectations for digital repositories? In some respects, I feel it has. However there are concerns which need to be recognised:

  • Poor quality resources which are hosted: there is no guarantee of the quality of the resources which are hosted on Scribd. And there are copyrighted publications (including those from O’Reilly) which have already been uploaded.
  • Sustainability of the service: As will all of these type of services, there is the question as to whether such services are sustainable. Techcrunch reported on 6 March 2007 that the service “is coming out of private beta this morning with a fresh Angel investment of $300K on top of their original Y Combinator nest egg of $12,000.“This may keep the service running for a short time, but will it be around in the medium to long term? And what will happen if copyright holders, such as O’Reilly, take the service to court for their misuse of their copyrighted resources (as Viacomm have recently done to YouTube).
  • Lack of a interoperable resource discovery architecture: The approach taken by Scribd is not interoperable with the approach being taken by the JISC development community, which is looking to support the development of distributed interoperable digital repository services which make use of OAI-PMH.

So perhaps Scribd might be felt to have no relevance to those involved in digital repository development work. I, however, feel that it would be a mistake to dismiss Scribd. We can’t guarantee that the service would have a role to play in the long term, but the approaches it has taken are worth exploring. Indeed, as I commented on some time ago in a posting about the accessibility of PDF resources in digital repositories) I feel that we should be exploring ways of improving the accessibility of repository services, and it is interesting that this commercial service, rather than one developed with the academic community, is taking a leading role in providing MP3 versions of papers in the repository.

And rather than just trying out Scribd to see what features might be worth implementing in our own repository services, is there an argument for making a deal with Scribd to host our scholarly resources in a managed fashion?

Technorati Tags:

Posted in Repositories, Web2.0 | 2 Comments »

Slideshare Repository and PDFs

Posted by Brian Kelly (UK Web Focus) on 28 March 2007

I recently discovered that the Slideshare service (a repository service for slides in PowerPoint or Open Office formats) also allows PDF files to be uploaded. This makes sense as PDFs can be used as a presentation format for slide shows. I then wondered whether Slideshare could be used as a repository for papers in PDF format. So I uploaded a PDF version of a paper on Contextual Web Accessibility – Maximizing the Benefit of Accessibility Guidelines (a paper presented at the W4A workshop in Edinburgh in May 2006). As can be seen, the PDF file has been successfully uploaded to the service (with over 200 views since the document was uploaded).

Slideshare service with an uploaded PDF file

Why am I doing this? If you access the resource you will discover that the text is too small to read unless you zoom in, and if you do this, you will have only a small screen area to read the paper. The file may be inaccessible (a Flash interface to a PDF file) , an issue discussed recently, and the PDF file is not easily printed, downloaded or reused (as Andy Powell commented a while ago, Slideshare is an example of ‘fake sharing’).

However such reservations are based on Slideshare in its current form. If the company felt there was a business case for hosting papers in PDF format, it would surely not be too difficult to provide a more appropriate user interface, and perhaps also providing access to printing and downloading services.

And even if Slideshare felt this was an inappropriate use of their service (and they could, of course, ban papers in PDF format for being hosted by the service) there are still a number of interesting issues which evaluating the service in this way can help address:

  • ease of uploading
  • rapid prototyping
  • architecture (URIs, APIs, …)
  • additional functionality
  • the pros and cons of allowing only quality publications to be uploaded

But since I first drafted this post, there have been further developments in this area – which I’ll address shortly.

Technorati Tags:

Posted in Repositories, Web2.0 | 8 Comments »

Slideshare – It’s Working For Me

Posted by Brian Kelly (UK Web Focus) on 14 February 2007

One of the first posts to this blogs, back in November 2006, describes my initial experiments with the Slideshare repository for presentations.

Slideshare Repository I described how I had uploaded several of my presentations, suggesting that this would provide greater exposure to the slides (and hence the ideas) than if they were only available on UKOLN’s Web site.

A few days ago I received an email alert which informed me that a number of the presentations had been added as a Favourite by a Slideshare user.

From his profile I discover that srains has a blog, Rolling Rains, which explores ‘the adoption of Universal Design (Design-for-All; Human-Centered Design) by the tourism industry’.

From the other slide show he has added to his list of favourites, I have found presentations which are of interest to me (including one on Two Trainers Trade Twenty Technology Training Tips and one on standards used on Oxfam Australia’s Web site).

Revisiting my uploaded slides I discover that the most popular of my presentations is Web 2.0: What Is It, How Can I Use It, How Can I Deploy It? with 666 views in two months, with 6 users including it in their list of favourite slideshows (jensjeppe, cezinha.com, noticiasmias2002, gerarddummer, erywin and MCL).

I can then follow their list of other favourites and the slides which they may have uploaded. And guess what: people who are interested in my slides on Web 2.0 are also interested in other slides on the same subject. So this ‘social network’ provides a form of resource discovery for me :-)

Three months after my initial posting about Slideshare what can I conclude:

  • It allows my slides (and therefore my ideas) to be accessed by people who would probably not find the resources otherwise.
  • It provides some form of measuring the impact/quality of the slides by observing the numbers of users who have added it to their list of favourites.
  • It help me (and others) to find related resources

Is there a downside? I need to remember that:

  • I don’t know how sustainable the service is – it could, for example, go out of business or change its licensing conditions (perhaps charging for access to the slides)
  • It is an example of ‘fake sharing’ – I can view the resources but not (easily) reuse the materials. In my case, however, I provide access to the original source files by including the URL of the master copy on the title slide and in the metadata.

I feel that these experiences provide some useful indications of features which could be adopted by the digital library development community: the importance of ease of use and lightweight approach to IPR issues for content providers; the advantages of getting content out ‘where the users are’ and the benefits of social networks for resource discovery.

Technorati Tags:

Posted in Repositories, Web2.0 | 14 Comments »

Accessibility and Institutional Repositories

Posted by Brian Kelly (UK Web Focus) on 12 December 2006

There has been some discussion on the JISC-Repositories JISCMail list (under the confusing subject line of “PLoS business models, global village”) on the issue of file formats for depositing scholarly papers. Some people (including myself) feel that open formats such as XHTML should be the preferred format; others feel that the effort required in creating XHTML can be a barrier to populating digital repositories, and that use of PDF can provide a simple low-effort solution, especially if authors are expected to take responsibility for uploading their papers to an institutional repository.

An issue I raised was the accessibility of resources in digital repositories. There are well established guidelines developed by WAI which can help to ensure that HTML content can be accessible to people with disabilities. Myself and others have argued that the guidelines and the WAI model is flawed, but many of the guidelines are helpful and institutions should seek to implement them (indeed there are legal requirements to ensure that services do not discriminate against people with disabilities).

WCAG 1 has the following requirements:
3.2 Create documents that validate to published formal grammars. [Priority 2]
11.1 Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported. [Priority 2]
11.4 If, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses W3C technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page. [Priority 1].

This seems to be pretty unfriendly towards PDFs, I would argue. WCAG 2.0 (which is in draft form) is, however, neutral regarding file formats – a development I welcome (although the guidelines still have their limitations). However the guidelines still require that content is accessible; and as well as the requirement in the guidelines, there are also legal and ethical requirements to address such issues.

Proprietary formats such as PDF can be made accessible. However I am uncertain as to how alternative text for images and providing structure to PDF documents will happen in a distributed workflow environment.

Rather than dwelling on this (technical) issue, I would like to focus on the policy issues, which should be independent of particular file formats. UK legislation requirements organisations to take reasonable measures to ensure that people with disabilities are not discriminated against unfairly. One could argue that it would be unreasonable to expect hundreds in not thousands of legacy resources to have accessibility metadata and document structures applied to them, if this could be demonstrated to be an expensive exercise of only very limited potential benefit. However if we seek to explore what may be regarded as ‘unreasonable’ we then need to define ‘reasonable’ actions which institutions providing institutional repositories would be expected to take.

One approach would be for the institution to ensure that it provides appropriate training and staff development for authors who are expected to upload documents to repositories. Linked to this may be tools which can flag problem areas to the authors, as documents are being prepared for uploading. There may then be auditing tools which can alert institutions to potential problems.

Related to policies to support the authors, are policies which address specific problems which users with disabilities may have. Clearly many scientific papers (containing formulae, for example) may be difficult to be processed by traditional assistive technologies. Perhaps this is where there is a need for just-in-time accessibility (as opposed to the traditional just-in case approach) or blended accessibility (real world alternatives to digital accessibility barriers).

Posted in Accessibility, Repositories | 9 Comments »