UK Web Focus

Innovation and best practices for the Web

Archive for February 21st, 2013

Naming Conventions For Institutional Repositories: Lessons from CORE

Posted by Brian Kelly (UK Web Focus) on 21 February 2013

The CORE (COnnecting REpositories) Project

Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to “facilitate free access to scholarly publications distributed across many systems“. The CORE Web site, which was developed at the Open University, provides access to four applications including:

Repository Analytics - A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.

I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.

CORE projectThe first four examples illustrate the difficulties I had in using the information. The first entry, for the Aberdeen University Research Archive, gives a clear indication of the host institution. The second example, Abertay Research Collections, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, Access to Research Resources for Teachers and Advanced Knowledge Technologies EPrints Archive, give no clue as to the host institution.

This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for “Southampton” enabled me to find eCrystals – Southampton and Electronics & Computer Science EPrints Service – University of Southampton - but not the main repository which has the name e-Prints Soton.

Using CORE to Search for Russell Group University Repositories

Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.

Ref.
No.
Institution
(search string)
Repository Metadata
Download
Metadata
Readable
PDF
Downloads
1 Birmingham University of Birmingham
Research Archive, E-papers Repository
    937     928  103
University of Birmingham
Research Archive, E-prints Repository
    828     802   766
University of Birmingham
Research Archive, E-theses Repository
  2,559   2,513 2,133
2 Bristol Bristol Repository of Scholarly Eprints    -        4   -
3 Cambridge Computer Laboratory Technical Reports
- Cambridge University
  3,252      520   440
DSpace @ Cambridge 216,718 192,129 2,847
4 Cardiff Online Research @ Cardiff    31,274     1,647 1,555
5 Durham Durham e-Theses     4,483    4,411 4,051
Durham Research Online     9,062    2,922 2,856
6 Exeter Exeter Research and Institutional Content archive     2,547    2,334      4
7 Edinburgh Edinburgh DataShare         75       75   -
Edinburgh Research Archive     5,769   5,395 1,583
8 Glasgow Glasgow DSpace Service    -   -   -
Glasgow Theses Service     2,682    2,683 2,356
9 Imperial Spiral – Imperial College Digital Repository     8,097    8,094       4
10 King’s College London
(also used King’s and Kings)
None found    -   -   -
11 Leeds leedsmet open search (Incorrect institution)    (-)    (-)    (-)
Leodis – A photographic archive of Leeds     57,998   57,998    -
12 Liverpool Liverpool John Moores University Research Archive
(Incorrect institution)
     (-)    (-)    (-)
University of Liverpool Research Archive       885     810   517
13 LSE LSE Research Online   33,959   6,520 6,463
LSE Theses Online       454     454   424
14 Manchester e-space at Manchester Metropolitan University
 (Incorrect institution)
  (-)    (-)   (-)
Manchester eScholar Services  119,854 119,854   -
15 Newcastle Newcastle University E-Prints    -   -   -
16 Nottingham Nottingham ePrints      1,084    1,026   990
Nottingham eTheses      1,843    1,793 1,757
17 Oxford Oxford University Research Archive    16,215    3,745     98
18 Queen Mary None found
19 Queen’s University Belfast None found    -   -   -
20 Sheffield Sheffield Hallam University Research Archive
(Incorrect institution)
    (-)   (-)   (-)
21 Southampton eCrystals – Southampton      602     602   -
Electronics & Computer Science EPrints Service -
University of Southampton
 15,835    8,947 7,071
22 UCL UCL Discovery          0 245,407       2
23 Warwick EPrints at the Centre for Scientific Computing,
University of Warwick
   -  -    360
Warwick Research Archives Portal Repository    49,469     7,696  7,025
24 York York St John University ArchivalWare Digital Library
(Incorrect institution)
       331          1   -

Note that the Repository Analytics page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.

Discussion

It is difficult to interpret the data given in the table: the entry for the UCL Discovery repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!

However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (“University of Birmingham Research Archive, E-prints Repository“, “University of Liverpool Research Archive” and “LSE Research Online”) or end of the repository name (“EPrints at the Centre for Scientific Computing, University of Warwick“, “Electronics & Computer Science EPrints Service - University of Southampton” and “Computer Laboratory Technical Reports - Cambridge University“) together with a large number of examples which use a partial form of the institution’s name (e.g. “Edinburgh Research Archive”, “Glasgow DSpace Service” and “Manchester eScholar Services“).

But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as “e-Prints Soton” and the “White Rose E-theses Online” and “White Rose Research Online” repositories which are used by the universities of Leeds, York and Sheffield.

Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution’s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.

In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:

Institutional repositories should contain the name of the host institution.

In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:

Access to Research Resources for Teachers - Department of Computer Science E-Repository - Enlighten - Modern Languages Publications Archive - Online Publications Store - Open Research Online - Pharmacy Eprints

If you are unfamiliar with these repositories, would you to able to guess who owns them?

Or, to put it another way, meaningful metadata is important for repositories!


View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – [Bit.ly]

Posted in openness, Repositories | 9 Comments »