UK Web Focus

Innovation and best practices for the Web

Archive for June 6th, 2011

A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories

Posted by Brian Kelly on 6 June 2011

Background

A recent post on How Do We Measure the Effectiveness of Institutional Repositories? sought to address the question of “What makes a good repository?” which was raised on the JISC-Repositories JISCMail list. The post outlined possible metrics which could be used for identifying the effectiveness of institutional repositories based on the intended purposes of a repository. In the post I suggested that if the purpose of a repository was to ensure the long-term preservation of resources, then there was a need to measure the number of full-text items in the repository – after all if the full text of a paper is not available the repository won’t be doing a very good job in the preservation of such resources!

The interest in this topic was revisited yesterday in a Twitter discussion which began with the suggestion from @PaulWalk that “I’ve thought we should use RepUK to measure actual persistence in repositories“‘. But in order to measure the persistence of of the actual resource we need to be able to differentiate between the persistence of the full-text item and the resource itself and not just the persistency of the URI of the item. How might one do this?

Initial Experimentation

Following a discussion with Les Carr at the JISC 2011 conference I discovered that the ePrints advanced search interface can be used to retrieve information on both the numbers of items containing the full text and those that do not. In order to see if this approach could be used I looked at UKOLN’s items in Opus, the University of Bath’s institutional repository. From this I found that there were a total of 344 items, of which 146 full text items were available (including published and confidential items) and 198 are metadata-only items. We can see that 42% of the items contain the full-text.

In order to see if this this use of ePrint’s advanced search could be used in a similar fashion for another repository I looked at the ECS ePrint Repository at the University of Southampton. This time I found that out of a total of 974 15,532  items the departmental repository contained 861 8,429 items with the full text and 113 7.093 metadata-only items – this time 54.3% of items contain the full-text.

But are these initial findings typical across the sector?

Survey Across Russell Group University Repositories

We might expect the 20 research-intensive Russell Group Universities to be playing a leading role in use of institutional repositories, with either institutional mandates (in the case of Southampton University) or institutional research culture helping to ensure that significant numbers of full-text items are deposited. But is this really the case? In order to investigate whether the approach described could be applied more widely the survey was carried out across Russell Group Universities.

Using the list of repositories taken from the OpenDOAR directory I found that 3 of the Russell group Universities seem to use the DSpace repository software and the advanced search functional in DSpace does not appear to allow searching to be restricted to full-text and metadata-only records.

Subsequent investigation of the advanced search capabilities of the remaining 17 institutions showed that only two seemed to provide the advanced search function which I used on the University of Bath and ECS, University of Southampton repositories. However there is a RESTful interface to the search and so the search parameters used to search the University of Bath repository was used across the other ePrint repositories. The following searches were carried out:

Query 1: Total Number of Items

http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

Query 2: Full text deposited (but access may be restricted)

http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

Query 3: No full text available:

http://eprint.domai/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

It was intended to use the survey methodology across the Russell Group universities which host an institutional repository based on the ePrints software. However it was not possible to get valid results for most of the repositories and it was subsequently discovered that this is an optional feature for ePrints repositories.

Rather than abandon this work I have decided to publish this post in order to encourage institutions which host an ePrints repository to implement this feature since I feel it would be beneficial to the repository community if we had a better picture of how institutions are using repositories to host full-text items.

The table below gives the results of the two test cases (from Bath and Southampton) together with details of the total number of items in the other repositories. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly. [Note there was an error in the figures for the ECS repository. This has now been corrected in the table below.]

Ref. No. Institutional Repository Details Query 1: Total Nos. of Items Query 2: Total Nos. of Full text Items Query 3: Total Nos.
of Metadata-Only items
Percentage of Full-Text Items
A InstitutionUniversity of Bath
Repository used
: Opus Repository
Summary
: Uses ePrints.
20,210 1,387 18,823 6.86%
B InstitutionECS, University of Southampton
Repository used
: eprint Repository
Summary
: Uses ePrints.
974 15,532 861 8,439 113  7,093  11.6% 54.3%
TOTAL 21,184  35,742 2,248 9,826 18,936 25,916  10.6% 27.4%

The table below gives the results of the findings for what seems to be the main repository from Russell Group Universities. Note that the results were gathered using the public advanced search interface where this was available. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly.

Ref. No. Institutional Repository Details Query 1: Total Nos.
of Items
Query 2: Total Nos. of
Full text Items
Query 3: Total Nos.
of Metadata-Only
items
Percentage of
Full-Text Items
1 Institution: University of Birmingham
Repository used: eprint Repository
Summary: Three entries. Uses ePrints.
411
2 Institution: University of Bristol
Summary: One entry. Uses DSpace
3 Institution: University of Cambridge
Summary: Four entries. Uses DSpace.
4 Institution: Cardiff University
Summary: 1 entry. Uses ePrints.
Repository used: ORCA
4,562
5 Institution: University of Edinburgh
Summary: Three entries. Uses DSpace.
6 Institution: University of Glasgow
Summary: Three entries. Uses ePrints.
Repository used: Enlighten
40,803
7 Institution: Imperial College
Repository used: Spiral
Summary: Type not known.
Not determined
8 Institution: King’s College London
Repository used: Department of
Computer Science E-Repository

Summary: One entry. Uses ePrints.
999
9 Institution: University of Leeds
Repository used: White Rose Research Online
Summary
: Uses ePrints. Shared by
Leeds, Sheffield and York.
8,013
10 Institution: University of Liverpool
Summary: One entry.
Repository used: Research Archive
698 641 57 93%
11 Institution: LSE
Summary: 2 entries.
Repository used: LSE Research Online
26,044 4,534 21,510 17.4%
12 Institution: University of Manchester
Summary: One entry.
Repository used: MMS
Not determined
13 Newcastle University
Summary: One entry.
Repository used: Newcastle Eprints
Not determined
14 Institution: University of Nottingham
Summary: One entry.
Repository used: Nottingham Eprints
781
15 Institution: University of Oxford
Summary: Five entries
Repository used
: ORA
Not determined
16 Institution: Queen’s University Belfast
Summary: One entry.
Repository used: Queen’s Papers
on Europeanisation & ConWEB
Not determined
17 Institution: University of Sheffield
Repository used: White Rose Research Online
Summary: See entry for Leeds.
8,013
18 Institution: University of Southampton
Summary: 11 entries.
Repository used: eprints.soton
60,438
19 Institution: University College London
Summary: 1 entry
Repository used: UCL Discovery
30,904
20 Institution: University of Warwick
Summary: 3 entries
Repository used: WRAP
1,633
TOTAL 183,299 5,175  21,567

At the time of writing we have to say that we do not know how many of the 183,299 items contain the full-text. All we can say is that there are at least 5,175 full-text items (or only 2.8%) – and this is based on the assumption that a full-text item represents the content of the metadata item, rather than for example, a PowerPoint slide used in the presentation of a paper.

An Opportunity for Developers

I should also like to point out that, as described on the DevCSI blog, the deadline for the Developer Challenge at Open Repositories 2011 (Austin, Texas) is Thursday 9 June. A CrowdVine page for the developer challenge describes how the Challenge is to “Show us the future of repositories“. Since “Remote presentations would be considered in exceptional circumstances” it strikes me that there might be an opportunity to submit an entry based on an analysis of the percentage of full-text items in repositories, but this would probably have to be done using an alternative approach. A suggestion for anyone who wold like to submit an based on this idea could be:

The future of repositories is to preserve the full text of research papers for future generations. We can see how well we are doing in implementing this vision which shows that xx% of repositories across the y sector already contain full-text items :-)

Or, if the results are disappointing:

The future of repositories is a gloomy one as only y% of repositories across the z sector contain full text items :-(

Alternatively we might conclude that new development is not required for those running ePrint repositories:

The future of repositories is reliant on the provision of evidence which can be used to policies and so ePrints repository managers should configure their services to provide the evidence describes in this post!

Is that an unreasonable suggestion?


Twitter conversation from Topsy: [View]

Posted in Evidence, Repositories | 14 Comments »