Naming Conventions For Institutional Repositories: Lessons from CORE

Posted by Brian Kelly on 21 Feb 2013

The CORE (COnnecting REpositories) Project

Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to “facilitate free access to scholarly publications distributed across many systems“. The CORE Web site, which was developed at the Open University, provides access to four applications including:

Repository Analytics – A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.

I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.

The first four examples illustrate the difficulties I had in using the information. The first entry, for the Aberdeen University Research Archive, gives a clear indication of the host institution. The second example, Abertay Research Collections, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, Access to Research Resources for Teachers and Advanced Knowledge Technologies EPrints Archive, give no clue as to the host institution.

This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for “Southampton” enabled me to find eCrystals – Southampton and Electronics & Computer Science EPrints Service – University of Southampton – but not the main repository which has the name e-Prints Soton.

Using CORE to Search for Russell Group University Repositories

Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.

Ref. No.	Institution (search string)	Repository	Metadata Download	Metadata Readable	PDF Downloads
1	Birmingham	University of Birmingham Research Archive, E-papers Repository	937	928	103
		University of Birmingham Research Archive, E-prints Repository	828	802	766
		University of Birmingham Research Archive, E-theses Repository	2,559	2,513	2,133
2	Bristol	Bristol Repository of Scholarly Eprints	–	4	–
3	Cambridge	Computer Laboratory Technical Reports – Cambridge University	3,252	520	440
3	Cambridge	DSpace @ Cambridge	216,718	192,129	2,847
4	Cardiff	Online Research @ Cardiff	31,274	1,647	1,555
5	Durham	Durham e-Theses	4,483	4,411	4,051
5	Durham	Durham Research Online	9,062	2,922	2,856
6	Exeter	Exeter Research and Institutional Content archive	2,547	2,334	4
7	Edinburgh	Edinburgh DataShare	75	75	–
7	Edinburgh	Edinburgh Research Archive	5,769	5,395	1,583
8	Glasgow	Glasgow DSpace Service	–	–	–
8	Glasgow	Glasgow Theses Service	2,682	2,683	2,356
9	Imperial	Spiral – Imperial College Digital Repository	8,097	8,094	4
10	King’s College London (also used King’s and Kings)	None found	–	–	–
11	Leeds	leedsmet open search (Incorrect institution)	(-)	(-)	(-)
11	Leeds	Leodis – A photographic archive of Leeds	57,998	57,998	–
12	Liverpool	Liverpool John Moores University Research Archive (Incorrect institution)	(-)	(-)	(-)
12	Liverpool	University of Liverpool Research Archive	885	810	517
13	LSE	LSE Research Online	33,959	6,520	6,463
13	LSE	LSE Theses Online	454	454	424
14	Manchester	e-space at Manchester Metropolitan University (Incorrect institution)	(-)	(-)	(-)
14	Manchester	Manchester eScholar Services	119,854	119,854	–
15	Newcastle	Newcastle University E-Prints	–	–	–
16	Nottingham	Nottingham ePrints	1,084	1,026	990
16	Nottingham	Nottingham eTheses	1,843	1,793	1,757
17	Oxford	Oxford University Research Archive	16,215	3,745	98
18	Queen Mary	None found
19	Queen’s University Belfast	None found	–	–	–
20	Sheffield	Sheffield Hallam University Research Archive (Incorrect institution)	(-)	(-)	(-)
21	Southampton	eCrystals – Southampton	602	602	–
21	Southampton	Electronics & Computer Science EPrints Service – University of Southampton	15,835	8,947	7,071
22	UCL	UCL Discovery	0	245,407	2
23	Warwick	EPrints at the Centre for Scientific Computing, University of Warwick	–	–	360
23	Warwick	Warwick Research Archives Portal Repository	49,469	7,696	7,025
24	York	York St John University ArchivalWare Digital Library (Incorrect institution)	331	1	–

Note that the Repository Analytics page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.

Discussion

It is difficult to interpret the data given in the table: the entry for the UCL Discovery repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!

However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (“University of Birmingham Research Archive, E-prints Repository“, “University of Liverpool Research Archive” and “LSE Research Online”) or end of the repository name (“EPrints at the Centre for Scientific Computing, University of Warwick“, “Electronics & Computer Science EPrints Service – University of Southampton” and “Computer Laboratory Technical Reports – Cambridge University“) together with a large number of examples which use a partial form of the institution’s name (e.g. “Edinburgh Research Archive”, “Glasgow DSpace Service” and “Manchester eScholar Services“).

But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as “e-Prints Soton” and the “White Rose E-theses Online” and “White Rose Research Online” repositories which are used by the universities of Leeds, York and Sheffield.

Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution’s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.

In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:

Institutional repositories should contain the name of the host institution.

In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:

Access to Research Resources for Teachers – Department of Computer Science E-Repository – Enlighten – Modern Languages Publications Archive – Online Publications Store – Open Research Online – Pharmacy Eprints

If you are unfamiliar with these repositories, would you to able to guess who owns them?

Or, to put it another way, meaningful metadata is important for repositories!

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – [Bit.ly]

This entry was posted on 21 Feb 2013 at 1:46 pm and is filed under openness, Repositories. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

10 Responses to “Naming Conventions For Institutional Repositories: Lessons from CORE”

starchim01 said

21 Feb 2013 at 6:08 pm
Reblogged this on startachim blog.

Reply
Naming Conventions For Institutional Repositories: Lessons from CORE | Open is mightier | Scoop.it said

22 Feb 2013 at 1:27 pm
[…] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th… […]

Reply
Ken Chad (@KenChad) said

22 Feb 2013 at 3:38 pm
I’ve used Bielefeld BASE (http://www.base-search.net/about/en/ ) system a fair bit to get open access material from repositories.The analytics from BASE are a bit different but I see they have White Rose and include in their listing the constituent universities are. I -wondered how you think that BASE compares with CORE?. Perhaps CORE and BASE should work together?

Reply
- Petr Knoth said
  
  28 Feb 2013 at 8:28 pm
  CORE and BASE are slightly different products. While BASE is more a metadata search engine for institutional repositories, CORE is more an aggregator for Open Access content (including full-texts). We provide a comparison in the “CORE: Three Access Levels to Underpin Open Access” paper published in D-Lib: http://www.dlib.org/dlib/november12/knoth/11knoth.html
  
  Reply
Brian Kelly (UK Web Focus) said

25 Feb 2013 at 8:38 am
Hi Ken
Thanks for the information about Bielefeld BASE. However my interest in in profiling use of repositories across UK HEIs. Looking at the statistics page this only gives the overall numbers of open access items globally. Also the search page only allows for searches on title, author or subject.
Regarding your suggestion that “CORE and BASE should work together” from the CORE blog it seems that the project is now complete, as a final post was published in July 2012.

Reply
- Petr Knoth said
  
  28 Feb 2013 at 8:25 pm
  This was only the final blog post for the ServiceCORE project (required by JISC). The work on CORE is an ongoing effort as part of a number of projects. We know the project will continue for a number of years (and hopefully indefinitely :)
  
  Reply
Nick Sheppard (@mrnick) said

28 Feb 2013 at 4:49 pm
Ha, may be it was a blessing that we could never come up with a nice acronym like everyone else!

Realise you are probably aware but Leeds, York and Sheffield all come under the White Rose consortium of course – White Rose Research Online (WRRO) which doesn’t affect your basic point.

CORE is still being developed I think – try @petrknoth on twitter.

Reply
Petr Knoth said

28 Feb 2013 at 8:18 pm
The Repository Analytics tool is a prototype which we started developing in the ServiceCORE project. The accuracy of the content statistics is something we should be improving as part of the Open Access Repository Registry to be developed in collaboration with UK RepositoryNet+ (project to start soon). The dashboard is intended to be used by repository managers to look-up their repository (the naming is copy pasted from OpenDOAR) and check that not only metadata, but also content can be harvested from repositories. This is necessary to help repositories ensure they are providing open access to content, not just open access to metadata.

The statistics should not be interpreted as the number of items in those repositories, but rather as the numbers of items (full-text items) that can be harvested from those repositories using OAI-PMH. Please also do note the dashboard is work in progress, so the stats might not be completely accurate yet.

There is a huge discrepancy in the way repositories expose metadata about their content through OAI-PMH, which dramatically influences the content harvestability. Taking into account only EPrints repositories (that are typically quite good in referencing full-texts) from the UK, the average repository will have for about 27.6% metadata records harvestable full-texts (but the median repository only about 13%).

The aim of Repository Analytics is to help repository managers to identify possible issues. The statistics can be also collected through the CORE API (http://core.kmi.open.ac.uk/api/doc – API stats methods), for those who do not like UIs. I have originally created some recommendations on the CORE website to increase the harvestability (http://core.kmi.open.ac.uk/intro/core_recommendations) of content from repositories. I have been in the last moth thinking about how they could be even more simplified. I just submitted a paper to OR 2013 about this. Happy to send it to those interested. Will make it publicly available if accepted :)

Reply
Naming Conventions For Institutional Repositories: Lessons from CORE | Repositorios recursos educativos digitales abiertos | Scoop.it said

7 Mar 2013 at 10:15 am
[…] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th… […]

Reply
Naming Conventions For Institutional Repositori... said

24 May 2014 at 12:30 am
[…] The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. Th… […]

Reply

	Open Knowledge Russi… on Guest Post: Data Expeditions a…
	우리카지노 추천 on Are There Too Many Male Speake…
	gwfence.co.kr on Are There Too Many Male Speake…
	Online Casino Korea on Are There Too Many Male Speake…
	Information Services… on IT Services Are Dead – Long Li…

UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

About this Blog

Recent Posts

Recent Comments

Blog Archives

Email Subscription (Feedburner)

Twitter

RSS Feeds

Syndicate This Page

Licence

Contact Details

Contact Details

Top Posts & Pages

Privacy

Cookies

Other Privacy Issues