The CORE (COnnecting REpositories) Project
Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to “facilitate free access to scholarly publications distributed across many systems“. The CORE Web site, which was developed at the Open University, provides access to four applications including:
Repository Analytics - A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.
I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.
The first four examples illustrate the difficulties I had in using the information. The first entry, for the Aberdeen University Research Archive, gives a clear indication of the host institution. The second example, Abertay Research Collections, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, Access to Research Resources for Teachers and Advanced Knowledge Technologies EPrints Archive, give no clue as to the host institution.
This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for “Southampton” enabled me to find eCrystals – Southampton and Electronics & Computer Science EPrints Service – University of Southampton - but not the main repository which has the name e-Prints Soton.
Using CORE to Search for Russell Group University Repositories
Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.
Note that the Repository Analytics page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.
It is difficult to interpret the data given in the table: the entry for the UCL Discovery repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!
However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (“University of Birmingham Research Archive, E-prints Repository“, “University of Liverpool Research Archive” and “LSE Research Online”) or end of the repository name (“EPrints at the Centre for Scientific Computing, University of Warwick“, “Electronics & Computer Science EPrints Service - University of Southampton” and “Computer Laboratory Technical Reports - Cambridge University“) together with a large number of examples which use a partial form of the institution’s name (e.g. “Edinburgh Research Archive”, “Glasgow DSpace Service” and “Manchester eScholar Services“).
But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as “e-Prints Soton” and the “White Rose E-theses Online” and “White Rose Research Online” repositories which are used by the universities of Leeds, York and Sheffield.
Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution’s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.
In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:
Institutional repositories should contain the name of the host institution.
In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:
Access to Research Resources for Teachers - Department of Computer Science E-Repository - Enlighten - Modern Languages Publications Archive - Online Publications Store - Open Research Online - Pharmacy Eprints
If you are unfamiliar with these repositories, would you to able to guess who owns them?
Or, to put it another way, meaningful metadata is important for repositories!