UK Web Focus

Reflections on the Web and Web 2.0

Archive for the ‘Repositories’ Category

The Launch of OPuS

Posted by Brian Kelly (UK Web Focus) on 4 February 2009

The University of Bath’s OPuS service, the online archive for University of Bath research publications, was launched yesterday (3rd February 2009) by Professor Jane Millar, the University’s Pro-Vice Chancellor (Research).

OPuS (which, incidentally, stands for ‘Online Publications Store’) currently holds over 12,000 references including journal articles, books and book sections, conference items, patents, reports and working papers, and research degree theses. Some of these items, including the theses are available in full-text. The aim of the service is to help strengthen the promotion and preservation of research outputs.

I recorded (with permission) Professor Jane Millar’s official launch of the service and this clip (which is also available on YouTube) is embedded below:

I should also add that the introduction to the launch was given by University Librarian, Howard Nicholson (YouTube video clip available) and Kara Jones, the university’s Research Publications Librarian, concluded the event by providing some facts and figures about the service and the role that she can play in supporting departmental use of the service (YouTube video clip available).

Many thanks to Kara Jones for organising this launch event and ensuring that a large number of the University’s research publications were uploaded to the service prior to the launch. Readers with particular interests in repositories may wish to add Kara’s My:self Archive blog to their RSS reader.

Posted in Repositories | Leave a Comment »

Institutional Repositories and the Costs Of Doing It Right

Posted by Brian Kelly (UK Web Focus) on 29 September 2008

There’s an interesting discussion taking place on the JISC-Repositories JISCMail list, following a post from Jenny Delasalle who asked:

Do any of you know how long it takes you to process a single item, before it is available as a live record in your repository? Please can you share that information with the list? 

Jenny provided details of her experiences:

Here at Warwick it takes at least 2 hours to process a single item. We are adding to our repository at a rate of about 15 items per week. I’m desperate to try to speed this up as we are receiving items faster than we can process them.

My colleague Pete Cliff somewhat tentatively suggestedwhy not put the items in the repository with minimal metadata“.

Pete and others seemed to feel that such compromises may be needed “in the current climate where quantity seems to have more impact than quality“. But this is where I would disagree.  This argument seems to be simply a cry for more resources in an area of interest to those making such a plea. But people will always be asking for more resources for their areas of interest – and, as there will always be limited resources, others will argue that their areas are more worthy of being allocated more resources.  And it strikes me as being somewhat disingenuous to have developed an approach which is known to be resource-intensive and then to make a plea for additional resources in order for the particular approach to be effective. A more honest approach would have been to develop a solution which was better suited for the available resources.

This was an argument I made last week in my talk on “Web Accessibility 3.0: Learning From The Past, Planning For The Future“. As I described in my talk (and note a 30 minute video of the talk is available). I pointed out that evidence suggests that Web accessibility policies based on conformance with WCAG AA have clearly failed, except in a small number of cases. And rather than calling for additional resources to be allocated to changing this we need to acknowledge that this won’t happen, and to explore alternative approaches.

And it is interesting to note that apprarent lack of interest on the JISC-Repositiories list in discussing the accessibility of resources in the repositories rather than the metadata requirements for aiding resource discover. Indeed when this topic was discussed a couple of year’s ago Les Carr, with a openness which I appreciated, argued that:

If accessibility is currently out of reach for journal articles, then it is another potential hindrance for OA. I think that if you go for OA first (get the literature online, change researchers’ working practices and expectations so that maximum dissemination is the normal state of affairs) THEN people will find they have a good reason to start to adapt their information dissemination behaviours towards better accessibility.

Here Les is arguing that the costs of providing accessibility resources in Institutional Repositories is too great, and can act as a barrier to maximising open access to institutional research activities. I would very much agree with Les that we need to argue priorities – as opposed to simply asking that someone (our institutions, the government – it’s never clear who) should give us more money to do the many good things we would like to do in our institutions.  

In the case of Institutional Repositories we then have competing pressures for resources for metadata creation and management and for enhancing the accessibility of the resources. In this context It should be noted that the WCAG 2.0 guidelines have reached the status of Candidate Recommendation, and that WAI Web site states quite clearlyWe encourage you to start using WCAG 2.0 now“. And note that, unlike the WCAG 1.0 guidelines, WCAG 2.0 is format neutral. So you can provide resources on your Web site in a variety of formats, but such resources need to conform with the guidelines if it is your institutional policy to do so.

So shouldn’t institutions who have made public commitment to comply with WCAG guidelines ensure that this applies to content in their institutional repositories, even if this will require a redeployment of effort from other activities, such as metadata creation?

Or, alternatively, you may feel that complying with a set of rules, such as WCAG, without doing the cost-benefit analysis or exploring other approaches to achieving the intended goals is mis-guided. In which case perhaps Pete’s suggestion that you might wish to consider “put[ting] the items in the repository with minimal metadata” might actually be a sensible approach rather than an unfortunate compromise? And in response to Philip Hunter’s comment that “achieving interoperability through dumbing-down the metadata has a strange attractiveness in a world not overly crazy for quality” perhaps we should be arguing that “achieving interoperability and accessibility through labour-intensive manual efforts is a perverse solution in a public sector environment in which should be demonstrating that we can provide cost effective solutions“?

Posted in Accessibility, Repositories | 1 Comment »

GCSEs Revisited

Posted by Brian Kelly (UK Web Focus) on 21 February 2008

It always pleasing when a blog post achieves its aim, and even more so when this happens so quickly. So it was good to read AJ Cann’s post in which he describes how he spent 3 minutes using the Google Custom Search Engine (GCSE) to provide an alternative to his institutional search engine. As he titled his post “It was all Brian Kelly’s fault“!

Revisiting my original post it would seem that there are a number of ways in which GCSE is being used:

In this latter case, AJ is clearly unhappy with the local search engine service (ht://Dig): “I can’t stand the inadequate institutional search tools I’ve been forced to use for a decade” – and decided it was worth spending “less than 30 seconds” to set up an alternative! And this approach reflects AJ’s interests in Personal Learning Environments (PLEs). He now has a Personal Search Engine.

Now if setting up GSCE across a range of Web sites is so easy and can be done by individuals without the need for institutional commitment. in what other ways could the software be used?

As we’ve recently discussed institutional repositories and various people have aired their concerns on the approaches being taken, it seems to me that the GCSE could have a role to play in providing an alternative way of searching repositories.

And this approach has already been taken on the OpenDOAR Search Repository Contents service and the Search ROAR Content With Google service.

This approach fits in nicely with Rachel Heery’s comment that “I don’t really see that there is conflict between encouraging more content going into institutional repositories and ambitions to provide more Web 2.0 type services on top of aggregated IR content. Surely these things go together?“. We have the managed content in the repository and are providing users with a choice in the selection of a search interface.

It’s good to see that happening. But can’t we do even more. We could, for example, use the two ways of searching for gaining evidence of the preferences users may have for searching. And perhaps rather than exposing new users of repositories to the rich functionality of the repository’s search interface, shouldn’t we acknowledge that many users will prefer the simplicity of a Google search, and provide the GCSE interface as better focussed alternative to the global Google search tool, with the option of pointing the users in the direction of the richer service if they find that this search interface is not good enough.

This approach would have the added advantage of not requiring the expenses associated with in-house software development. Indeed could it not be argued public-sector organisations should have a responsibility to make use of relevant freely-available services, at least in prototyping or providing a service for making comparisons even if it isn’t envisaged that the service will be used in a final production role?

Of course the danger may be that the users decide that they are happy with Google. And we wouldn’t want that to happen, would we?

Posted in Repositories | Tagged: | 5 Comments »

Distributed Discussions On Repositories

Posted by Brian Kelly (UK Web Focus) on 19 February 2008

The Repositories Debate

Andy Powell recently wrote a post on the eFoundations blog about his opening plenary talk at the VALA 2008 conference.

His post generated interesting discussions and debate amongst those involved in repository activities in the UK and the wider community. Paul Miller was in agreement with Andy’s comments in his post on the Panlibus blog entitled “Andy Powell is Spot On” with Paul feeling that “Our current approach, fundamentally, is totally, completely, utterly wrong, isn’t it?”.

Over on his blog my colleague Paul Walk has given his thoughts on Andy’s post expressing agreement in several areas but disagreeing with Andy’s view that “we need to focus on building and/or using global scholarly social networks based on global repository services“. Paul (W) responds by asking “Why can’t we “focus on building and/or using global scholarly social networks” (which I support) based on institutional repository services? We don’t have a problem with institutional web sites do we? Or institutional library OPACs?”. My former colleague Rachel Heery has responded in a similar vein to Paul in a response to Andy’s post: “I don’t really see that there is conflict between encouraging more content going into institutional repositories and ambitions to provide more Web 2.0 type services on top of aggregated IR content. Surely these things go together?“.

Meanwhile over on his Overdue Ideas blog Owen Stephens gives his thoughts from the perspective of a practitioner involved in setting up the Spir@l institutional repository at Imperial College with a wittily-titled post “R.I.Positories“. Owen concludes “we need is a system that helps us administer the workflow around the delivery of digital objects in a corporate environment, but that is invisible to those not involved in the administration – and that’s what I want out of a ‘repository’ – so, for me, the Repository is dead, long live the repository“.

And a few minutes ago I noticed a pop-up alert informing me of a blog post entitled “RESTful Repositories?“. An intriguing title, I thought, so I viewed the post and came across Stu Weibel’s contribution which suggested that “One way to think about repositories is as the bookshelves of the digital library“. Stu went on to point out that “We don’t ask scholars, having just published an article or book, to ‘go to the library to find the most appropriate place for it… and don’t come back until you do!’“   This sounds reasonable to me – there’s a need for the physical library and the infrastructure that is associated with it, but the researchers don’t need to know how it works. This might be an approach to be taken with institutional repositories – so let’s not scare them off with the ins and outs of the metadata schemas.

Engaging With A Distributed Debate

There’s clearly an interesting debate taking place around the approaches which should be taken to maximising access to the UK’s research papers. But if you have an interest in institutional repositories how do you find out where the debate is taking place and how do you participate?

I have had discussions with colleagues who feel that such debates should be centralised and should use a ubiquitous communications channel – namely email. From this perspective the debate about institutional repositories within the UK higher education community should take place on the JISC-Repositories JISCMail list. However I feel that this will result in the debate being marginalised to those with a particularly strong interest in repositories, will tend to focus on the nitty-gritty details which email tends to encourage and, in the case of JISCMail, the debate will be trapped within the JISCMail Web site, not only because the JISCMail archives are not exposed to search engines such as Google, but also because of the ‘uncool’ URIs for messages in the archive.

And, of course, email discussions fragment, in any case, and I suspect the Australian participants at the VALA 2008 conference will be having their own discussions about repositories on their own mailing lists.

An alternative view is that the debate with take place via scholarly articles published in peer-reviewed journals. This may be the case in many areas of research, but man in the digital library community would be frustrated by the lengthy timescales that process would entail.

Like it or not, the debate is taking place using a variety of communications tools, including the blogosphere.

So, if you wish to engage with such discussions, how do you find out what is happening? In my case my RSS reader (Feedreader) will automatically inform me of new posts for the blogs I’ve subscribed to. This includes the eFoundations blog, although in the case of Andy’s post I was alerted to its publication a couple of hours after it had been published via a tweet on Twitter.

The distributed nature of such debates has benefit, such as allowing the discussions to be brought to the attention of different communities. When doing this, there is an expectation that bloggers will link to the original post. And if blogs allow trackbacks, it will be possible to follow links from an original post to blogs which have commented on it.

Returning to Andy’s original post, Paul Walk noticed that the eFoundation’s blog hadn’t included a trackback to Paul’s post. This is probably a technical glitch – but this incident made me think about the importance of trackbacks in the integration of distributed discussions. Owen Stephen’s R.I.P.ositories post included a link to a post on The importance of being open the eFoundation blog dating back to October 2006. But comments to such old posts are disabled – I assume to minimise the effort in deleting spam comments. But this is breaking the linkages to related discussions. How, then, should we balance the benefits of allowing such tracebacks versus the maintenance costs of managing misuse?  Or do you disagree with blogs being used for this type of discussion and debate?

Posted in Blog, Repositories | 7 Comments »

CRIG Teleconference Chats On ‘Repositories And Other Services’

Posted by Brian Kelly (UK Web Focus) on 6 December 2007

I recently took part in one of a series of teleconference chats organised by the JISC-funded CRIG (Common Repository Interfaces Working Group) project.

The project organised a day of tele-conferences on 8th November 2007. The aim of the day was to facilitate a “discussion between members on how repositories might be improved (bluesky thinking)“. A recording of the discussions is available from the DigRep wiki. In addition, the project team created a series of mindmaps which helped to visualise the topics covered in the seven areas covered during the day.

I took part on the final discussion of the day which looked at other services which may interface with repositories, with a particular focus on the role of externally-hosted Web 2.0 services. The mindmap for this session is shown below.

Mindmap of discussions
(Click for larger display).

The discussions revolved around the in-house development vs. use of Web 2.0 services which are a recurring topic of discussion. I did, however, find that the visualisation of the discussions provided me with the opportunity to revisit these issues from a different perspective. I’ll have to have another look at mindmapping tools, I think.  And reading Mike Ellis’s post on Good web apps: Back of postage stamp… it would seem that MindMeister should be the first tool for me to look at.

Posted in Repositories | Tagged: , | Leave a Comment »

Scribd – Doing For Documents What Slideshare Does For Presentations

Posted by Brian Kelly (UK Web Focus) on 29 March 2007

As I’ve recently described, a couple of months ago I uploaded PDFs of a few of my papers to Slideshare, and wondered whether there was a business opportunity for Slideshare in extending its remit from providing a repository of slideshows to include documents in general.

Well last week I came across Scribd – a Web 2.0 service which provides this functionality, describing itself as “YouTube for documents”. I registered for the service (although, strangely, you don’t need to be registered to upload documents) and uploaded several of my papers. And I have to admit that I’m very impressed with the service. I could upload my papers in several formats (including MS Word, PDF, MS PowerPoint and MS Excel) and, when I uploaded an MS Word document, alternative formats were created, including PDF, HTML, plain text and even an MP3 file which provided a computer-generated sound file for the paper! As well as the accessibility benefits which this may provide, being able to download various formats means that the service cannot be accusing of ‘fake sharing’ – a term coined on the lessig blog and discussed on the O’Reilly Radar and eFoundations blogs.

Scribd Interface

The interface seemed very usable; as well as allowing the paper to be viewed in a variety of formats Scribd, as seems to be the norm for these type of services, allows resources to be bookmarked (’favourited’ seems to be the word used to describe this), usage statistics are provided and, as with Slideshare, the resource can be embedded in Web pages.

Has Scribd raised the bar in users’ expectations for digital repositories? In some respects, I feel it has. However there are concerns which need to be recognised:

  • Poor quality resources which are hosted: there is no guarantee of the quality of the resources which are hosted on Scribd. And there are copyrighted publications (including those from O’Reilly) which have already been uploaded.
  • Sustainability of the service: As will all of these type of services, there is the question as to whether such services are sustainable. Techcrunch reported on 6 March 2007 that the service “is coming out of private beta this morning with a fresh Angel investment of $300K on top of their original Y Combinator nest egg of $12,000.“This may keep the service running for a short time, but will it be around in the medium to long term? And what will happen if copyright holders, such as O’Reilly, take the service to court for their misuse of their copyrighted resources (as Viacomm have recently done to YouTube).
  • Lack of a interoperable resource discovery architecture: The approach taken by Scribd is not interoperable with the approach being taken by the JISC development community, which is looking to support the development of distributed interoperable digital repository services which make use of OAI-PMH.

So perhaps Scribd might be felt to have no relevance to those involved in digital repository development work. I, however, feel that it would be a mistake to dismiss Scribd. We can’t guarantee that the service would have a role to play in the long term, but the approaches it has taken are worth exploring. Indeed, as I commented on some time ago in a posting about the accessibility of PDF resources in digital repositories) I feel that we should be exploring ways of improving the accessibility of repository services, and it is interesting that this commercial service, rather than one developed with the academic community, is taking a leading role in providing MP3 versions of papers in the repository.

And rather than just trying out Scribd to see what features might be worth implementing in our own repository services, is there an argument for making a deal with Scribd to host our scholarly resources in a managed fashion?

Technorati Tags:

Posted in Repositories, Web2.0 | 1 Comment »

Slideshare Repository and PDFs

Posted by Brian Kelly (UK Web Focus) on 28 March 2007

I recently discovered that the Slideshare service (a repository service for slides in PowerPoint or Open Office formats) also allows PDF files to be uploaded. This makes sense as PDFs can be used as a presentation format for slide shows. I then wondered whether Slideshare could be used as a repository for papers in PDF format. So I uploaded a PDF version of a paper on Contextual Web Accessibility – Maximizing the Benefit of Accessibility Guidelines (a paper presented at the W4A workshop in Edinburgh in May 2006). As can be seen, the PDF file has been successfully uploaded to the service (with over 200 views since the document was uploaded).

Slideshare service with an uploaded PDF file

Why am I doing this? If you access the resource you will discover that the text is too small to read unless you zoom in, and if you do this, you will have only a small screen area to read the paper. The file may be inaccessible (a Flash interface to a PDF file) , an issue discussed recently, and the PDF file is not easily printed, downloaded or reused (as Andy Powell commented a while ago, Slideshare is an example of ‘fake sharing’).

However such reservations are based on Slideshare in its current form. If the company felt there was a business case for hosting papers in PDF format, it would surely not be too difficult to provide a more appropriate user interface, and perhaps also providing access to printing and downloading services.

And even if Slideshare felt this was an inappropriate use of their service (and they could, of course, ban papers in PDF format for being hosted by the service) there are still a number of interesting issues which evaluating the service in this way can help address:

  • ease of uploading
  • rapid prototyping
  • architecture (URIs, APIs, …)
  • additional functionality
  • the pros and cons of allowing only quality publications to be uploaded

But since I first drafted this post, there have been further developments in this area – which I’ll address shortly.

Technorati Tags:

Posted in Repositories, Web2.0 | 8 Comments »

Slideshare – It’s Working For Me

Posted by Brian Kelly (UK Web Focus) on 14 February 2007

One of the first posts to this blogs, back in November 2006, describes my initial experiments with the Slideshare repository for presentations.

Slideshare Repository I described how I had uploaded several of my presentations, suggesting that this would provide greater exposure to the slides (and hence the ideas) than if they were only available on UKOLN’s Web site.

A few day’s ago I received an email alert which informed me that a number of the presentations had been added as a Favourite by a Slideshare user.

From his profile I discover that srains has a blog, Rolling Rains, which explores ‘the adoption of Universal Design (Design-for-All; Human-Centered Design) by the tourism industry’.

From the other slide show he has added to his list of favourites, I have found presentations which are of interest to me (including one on Two Trainers Trade Twenty Technology Training Tips and one on standards used on Oxfam Australia’s Web site).

Revisiting my uploaded slides I discover that the most popular of my presentations is Web 2.0: What Is It, How Can I Use It, How Can I Deploy It? with 666 views in two months, with 6 users including it in their list of favourite slideshows (jensjeppe, cezinha.com, noticiasmias2002, gerarddummer, erywin and MCL).

I can then follow their list of other favourites and the slides which they may have uploaded. And guess what: people who are interested in my slides on Web 2.0 are also interested in other slides on the same subject. So this ’social network’ provides a form of resource discovery for me :-)

Three months after my initial posting about Slideshare what can I conclude:

  • It allows my slides (and therefore my ideas) to be accessed by people who would probably not find the resources otherwise.
  • It provides some form of measuring the impact/quality of the slides by observing the numbers of users who have added it to their list of favourites.
  • It help me (and others) to find related resources

Is there a downside? I need to remember that:

  • I don’t know how sustainable the service is – it could, for example, go out of business or change its licensing conditions (perhaps charging for access to the slides)
  • It is an example of ‘fake sharing’ – I can view the resources but not (easily) reuse the materials. In my case, however, I provide access to the original source files by including the URL of the master copy on the title slide and in the metadata.

I feel that these experiences provide some useful indications of features which could be adopted by the digital library development community: the importance of ease of use and lightweight approach to IPR issues for content providers; the advantages of getting content out ‘where the users are’ and the benefits of social networks for resource discovery.

Technorati Tags:

Posted in Repositories, Web2.0 | 14 Comments »

Accessibility and Institutional Repositories

Posted by Brian Kelly (UK Web Focus) on 12 December 2006

There has been some discussion on the JISC-Repositories JISCMail list (under the confusing subject line of “PLoS business models, global village”) on the issue of file formats for depositing scholary papers. Some people (including myself) feel that open formats such as XHTML should be the preferred format; others feel that the effort required in creating XHTML can be a barrier to populating digital repositories, and that use of PDF can provide a simple low-effort solution, especially if authors are expected to take responsibility for uploading their papers to an institutional repository.

An issue I raised was the accessibility of resources in digital repositories. There are well established guidelines developed by WAI which can help to ensure that HTML content can be accessible to people with disabilities. Myself and others have argued that the guidelines and the WAI model is flawed, but many of the guidelines are helpful and institutions should seek to implement them (indeed there are legal requirements to ensure that services do not discriminate against people with disabilities).

WCAG 1 has the following requirements:
3.2 Create documents that validate to published formal grammars. [Priority 2]
11.1 Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported. [Priority 2]
11.4 If, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses W3C technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page. [Priority 1].

This seems to be pretty unfriendly towards PDFs, I would argue. WCAG 2.0 (which is in draft form) is, however, neutral regarding file formats – a development I welcome (although the guidelines still have their limitations). However the guidelines still require that content is accessible; and as well as the requirement in the guidelines, there are also legal and ethical requirements to address such issues.

Proprietary formats such as PDF can be made accessible. However I am uncertain as to how alternative text for images and providing structure to PDF documents will happen in a distributed workflow environment.

Rather than dwelling on this (technical) issue, I would like to focus on the policy issues, which should be independent of particular file formats. UK legislation requirements organisations to take reasonable measures to ensure that people with disabilities are not discriminated against unfairly. One could argue that it would be unreasonable to expect hundreds in not thousands of legacy resources to have accessibility metadata and document structures applied to them, if this could be demonstrated to be an expensive exercise of only very limited potential benefit. However if we seek to explore what may be regarded as ‘unreasonable’ we then need to define ‘reasonable’ actions which institutions providing institutional repositories would be expected to take.

One approach would be for the institution to ensure that it provides appropriate training and staff development for authors who are expected to upload documents to repositories. Linked to this may be tools which can flag problem areas to the authors, as documents are being prepared for uploading. There may then be auditing tools which can alert institutions to potential problems.

Related to policies to support the authors, are policies which address specific problems which users with disabilities may have. Clearly many scientific papers (containing formulae, for example) may be difficult to be processed by traditional assistive technologies. Perhaps this is where there is a need for just-in-time accessibility (as opposed to the traditional just-in case approach) or blended accessibility (real world alternatives to digital accessibility barriers).

Posted in Accessibility, Repositories | 7 Comments »