UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Naming Conventions For Institutional Repositories: Lessons from CORE

Posted by Brian Kelly on 21 Feb 2013

The CORE (COnnecting REpositories) Project

Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to “facilitate free access to scholarly publications distributed across many systems“. The CORE Web site, which was developed at the Open University, provides access to four applications including:

Repository Analytics – A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.

I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.

CORE projectThe first four examples illustrate the difficulties I had in using the information. The first entry, for the Aberdeen University Research Archive, gives a clear indication of the host institution. The second example, Abertay Research Collections, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, Access to Research Resources for Teachers and Advanced Knowledge Technologies EPrints Archive, give no clue as to the host institution.

This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for “Southampton” enabled me to find eCrystals – Southampton and Electronics & Computer Science EPrints Service – University of Southampton – but not the main repository which has the name e-Prints Soton.

Using CORE to Search for Russell Group University Repositories

Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.

Ref.
No.
Institution
(search string)
Repository Metadata
Download
Metadata
Readable
PDF
Downloads
1 Birmingham University of Birmingham
Research Archive, E-papers Repository
    937     928  103
University of Birmingham
Research Archive, E-prints Repository
    828     802   766
University of Birmingham
Research Archive, E-theses Repository
  2,559   2,513 2,133
2 Bristol Bristol Repository of Scholarly Eprints    –        4   –
3 Cambridge Computer Laboratory Technical Reports
– Cambridge University
  3,252      520   440
DSpace @ Cambridge 216,718 192,129 2,847
4 Cardiff Online Research @ Cardiff    31,274     1,647 1,555
5 Durham Durham e-Theses     4,483    4,411 4,051
Durham Research Online     9,062    2,922 2,856
6 Exeter Exeter Research and Institutional Content archive     2,547    2,334      4
7 Edinburgh Edinburgh DataShare         75       75   –
Edinburgh Research Archive     5,769   5,395 1,583
8 Glasgow Glasgow DSpace Service    –   –   –
Glasgow Theses Service     2,682    2,683 2,356
9 Imperial Spiral – Imperial College Digital Repository     8,097    8,094       4
10 King’s College London
(also used King’s and Kings)
None found    –   –   –
11 Leeds leedsmet open search (Incorrect institution)    (-)    (-)    (-)
Leodis – A photographic archive of Leeds     57,998   57,998    –
12 Liverpool Liverpool John Moores University Research Archive
(Incorrect institution)
     (-)    (-)    (-)
University of Liverpool Research Archive       885     810   517
13 LSE LSE Research Online   33,959   6,520 6,463
LSE Theses Online       454     454   424
14 Manchester e-space at Manchester Metropolitan University
 (Incorrect institution)
  (-)    (-)   (-)
Manchester eScholar Services  119,854 119,854   –
15 Newcastle Newcastle University E-Prints    –   –   –
16 Nottingham Nottingham ePrints      1,084    1,026   990
Nottingham eTheses      1,843    1,793 1,757
17 Oxford Oxford University Research Archive    16,215    3,745     98
18 Queen Mary None found
19 Queen’s University Belfast None found    –   –   –
20 Sheffield Sheffield Hallam University Research Archive
(Incorrect institution)
    (-)   (-)   (-)
21 Southampton eCrystals – Southampton      602     602   –
Electronics & Computer Science EPrints Service –
University of Southampton
 15,835    8,947 7,071
22 UCL UCL Discovery          0 245,407       2
23 Warwick EPrints at the Centre for Scientific Computing,
University of Warwick
   –  –    360
Warwick Research Archives Portal Repository    49,469     7,696  7,025
24 York York St John University ArchivalWare Digital Library
(Incorrect institution)
       331          1   –

Note that the Repository Analytics page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.

Discussion

It is difficult to interpret the data given in the table: the entry for the UCL Discovery repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!

However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (“University of Birmingham Research Archive, E-prints Repository“, “University of Liverpool Research Archive” and “LSE Research Online”) or end of the repository name (“EPrints at the Centre for Scientific Computing, University of Warwick“, “Electronics & Computer Science EPrints Service – University of Southampton” and “Computer Laboratory Technical Reports – Cambridge University“) together with a large number of examples which use a partial form of the institution’s name (e.g. “Edinburgh Research Archive”, “Glasgow DSpace Service” and “Manchester eScholar Services“).

But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as “e-Prints Soton” and the “White Rose E-theses Online” and “White Rose Research Online” repositories which are used by the universities of Leeds, York and Sheffield.

Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution’s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.

In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:

Institutional repositories should contain the name of the host institution.

In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:

Access to Research Resources for Teachers – Department of Computer Science E-Repository – Enlighten – Modern Languages Publications Archive – Online Publications Store – Open Research Online – Pharmacy Eprints

If you are unfamiliar with these repositories, would you to able to guess who owns them?

Or, to put it another way, meaningful metadata is important for repositories!


View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – [Bit.ly]

10 Responses to “Naming Conventions For Institutional Repositories: Lessons from CORE”

  1. starchim01 said

    Reblogged this on startachim blog.

  2. […] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th…  […]

  3. I’ve used Bielefeld BASE (http://www.base-search.net/about/en/ ) system a fair bit to get open access material from repositories.The analytics from BASE are a bit different but I see they have White Rose and include in their listing the constituent universities are. I -wondered how you think that BASE compares with CORE?. Perhaps CORE and BASE should work together?

  4. Hi Ken
    Thanks for the information about Bielefeld BASE. However my interest in in profiling use of repositories across UK HEIs. Looking at the statistics page this only gives the overall numbers of open access items globally. Also the search page only allows for searches on title, author or subject.
    Regarding your suggestion that “CORE and BASE should work together” from the CORE blog it seems that the project is now complete, as a final post was published in July 2012.

    • Petr Knoth said

      This was only the final blog post for the ServiceCORE project (required by JISC). The work on CORE is an ongoing effort as part of a number of projects. We know the project will continue for a number of years (and hopefully indefinitely :)

  5. Ha, may be it was a blessing that we could never come up with a nice acronym like everyone else!

    Realise you are probably aware but Leeds, York and Sheffield all come under the White Rose consortium of course – White Rose Research Online (WRRO) which doesn’t affect your basic point.

    CORE is still being developed I think – try @petrknoth on twitter.

  6. Petr Knoth said

    The Repository Analytics tool is a prototype which we started developing in the ServiceCORE project. The accuracy of the content statistics is something we should be improving as part of the Open Access Repository Registry to be developed in collaboration with UK RepositoryNet+ (project to start soon). The dashboard is intended to be used by repository managers to look-up their repository (the naming is copy pasted from OpenDOAR) and check that not only metadata, but also content can be harvested from repositories. This is necessary to help repositories ensure they are providing open access to content, not just open access to metadata.

    The statistics should not be interpreted as the number of items in those repositories, but rather as the numbers of items (full-text items) that can be harvested from those repositories using OAI-PMH. Please also do note the dashboard is work in progress, so the stats might not be completely accurate yet.

    There is a huge discrepancy in the way repositories expose metadata about their content through OAI-PMH, which dramatically influences the content harvestability. Taking into account only EPrints repositories (that are typically quite good in referencing full-texts) from the UK, the average repository will have for about 27.6% metadata records harvestable full-texts (but the median repository only about 13%).

    The aim of Repository Analytics is to help repository managers to identify possible issues. The statistics can be also collected through the CORE API (http://core.kmi.open.ac.uk/api/doc – API stats methods), for those who do not like UIs. I have originally created some recommendations on the CORE website to increase the harvestability (http://core.kmi.open.ac.uk/intro/core_recommendations) of content from repositories. I have been in the last moth thinking about how they could be even more simplified. I just submitted a paper to OR 2013 about this. Happy to send it to those interested. Will make it publicly available if accepted :)

  7. […] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th…  […]

  8. […] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th…  […]

Leave a comment