Background
A recent post on How Do We Measure the Effectiveness of Institutional Repositories? sought to address the question of “What makes a good repository?” which was raised on the JISC-Repositories JISCMail list. The post outlined possible metrics which could be used for identifying the effectiveness of institutional repositories based on the intended purposes of a repository. In the post I suggested that if the purpose of a repository was to ensure the long-term preservation of resources, then there was a need to measure the number of full-text items in the repository – after all if the full text of a paper is not available the repository won’t be doing a very good job in the preservation of such resources!
The interest in this topic was revisited yesterday in a Twitter discussion which began with the suggestion from @PaulWalk that “I’ve thought we should use RepUK to measure actual persistence in repositories“‘. But in order to measure the persistence of of the actual resource we need to be able to differentiate between the persistence of the full-text item and the resource itself and not just the persistency of the URI of the item. How might one do this?
Initial Experimentation
Following a discussion with Les Carr at the JISC 2011 conference I discovered that the ePrints advanced search interface can be used to retrieve information on both the numbers of items containing the full text and those that do not. In order to see if this approach could be used I looked at UKOLN’s items in Opus, the University of Bath’s institutional repository. From this I found that there were a total of 344 items, of which 146 full text items were available (including published and confidential items) and 198 are metadata-only items. We can see that 42% of the items contain the full-text.
In order to see if this this use of ePrint’s advanced search could be used in a similar fashion for another repository I looked at the ECS ePrint Repository at the University of Southampton. This time I found that out of a total of 974 15,532 items the departmental repository contained 861 8,429 items with the full text and 113 7.093 metadata-only items – this time 54.3% of items contain the full-text.
But are these initial findings typical across the sector?
Survey Across Russell Group University Repositories
We might expect the 20 research-intensive Russell Group Universities to be playing a leading role in use of institutional repositories, with either institutional mandates (in the case of Southampton University) or institutional research culture helping to ensure that significant numbers of full-text items are deposited. But is this really the case? In order to investigate whether the approach described could be applied more widely the survey was carried out across Russell Group Universities.
Using the list of repositories taken from the OpenDOAR directory I found that 3 of the Russell group Universities seem to use the DSpace repository software and the advanced search functional in DSpace does not appear to allow searching to be restricted to full-text and metadata-only records.
Subsequent investigation of the advanced search capabilities of the remaining 17 institutions showed that only two seemed to provide the advanced search function which I used on the University of Bath and ECS, University of Southampton repositories. However there is a RESTful interface to the search and so the search parameters used to search the University of Bath repository was used across the other ePrint repositories. The following searches were carried out:
Query 1: Total Number of Items
http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search
Query 2: Full text deposited (but access may be restricted)
http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search
Query 3: No full text available:
http://eprint.domai/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search
It was intended to use the survey methodology across the Russell Group universities which host an institutional repository based on the ePrints software. However it was not possible to get valid results for most of the repositories and it was subsequently discovered that this is an optional feature for ePrints repositories.
Rather than abandon this work I have decided to publish this post in order to encourage institutions which host an ePrints repository to implement this feature since I feel it would be beneficial to the repository community if we had a better picture of how institutions are using repositories to host full-text items.
The table below gives the results of the two test cases (from Bath and Southampton) together with details of the total number of items in the other repositories. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly. [Note there was an error in the figures for the ECS repository. This has now been corrected in the table below.]
| Ref. No. | Institutional Repository Details | Query 1: Total Nos. of Items | Query 2: Total Nos. of Full text Items | Query 3: Total Nos. of Metadata-Only items |
Percentage of Full-Text Items |
| A | Institution: University of Bath Repository used: Opus Repository Summary: Uses ePrints. |
20,210 | 1,387 | 18,823 | 6.86% |
| B | Institution: ECS, University of Southampton Repository used: eprint Repository Summary: Uses ePrints. |
|
|||
| TOTAL |
The table below gives the results of the findings for what seems to be the main repository from Russell Group Universities. Note that the results were gathered using the public advanced search interface where this was available. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly.
| Ref. No. | Institutional Repository Details | Query 1: Total Nos. of Items |
Query 2: Total Nos. of Full text Items |
Query 3: Total Nos. of Metadata-Only items |
Percentage of Full-Text Items |
| 1 | Institution: University of Birmingham Repository used: eprint Repository Summary: Three entries. Uses ePrints. |
411 | |||
| 2 | Institution: University of Bristol Summary: One entry. Uses DSpace |
||||
| 3 | Institution: University of Cambridge Summary: Four entries. Uses DSpace. |
||||
| 4 | Institution: Cardiff University Summary: 1 entry. Uses ePrints. Repository used: ORCA |
4,562 | |||
| 5 | Institution: University of Edinburgh Summary: Three entries. Uses DSpace. |
||||
| 6 | Institution: University of Glasgow Summary: Three entries. Uses ePrints. Repository used: Enlighten |
40,803 | |||
| 7 | Institution: Imperial College Repository used: Spiral Summary: Type not known. |
Not determined | |||
| 8 | Institution: King’s College London Repository used: Department of Computer Science E-Repository Summary: One entry. Uses ePrints. |
999 | |||
| 9 | Institution: University of Leeds Repository used: White Rose Research Online Summary: Uses ePrints. Shared by Leeds, Sheffield and York. |
8,013 | |||
| 10 | Institution: University of Liverpool Summary: One entry. Repository used: Research Archive |
698 | 641 | 57 | 93% |
| 11 | Institution: LSE Summary: 2 entries. Repository used: LSE Research Online |
26,044 | 4,534 | 21,510 | 17.4% |
| 12 | Institution: University of Manchester Summary: One entry. Repository used: MMS |
Not determined | |||
| 13 | Newcastle University Summary: One entry. Repository used: Newcastle Eprints |
Not determined | |||
| 14 | Institution: University of Nottingham Summary: One entry. Repository used: Nottingham Eprints |
781 | |||
| 15 | Institution: University of Oxford Summary: Five entries Repository used: ORA |
Not determined | |||
| 16 | Institution: Queen’s University Belfast Summary: One entry. Repository used: Queen’s Papers on Europeanisation & ConWEB |
Not determined | |||
| 17 | Institution: University of Sheffield Repository used: White Rose Research Online Summary: See entry for Leeds. |
8,013 | |||
| 18 | Institution: University of Southampton Summary: 11 entries. Repository used: eprints.soton |
60,438 | |||
| 19 | Institution: University College London Summary: 1 entry Repository used: UCL Discovery |
30,904 | |||
| 20 | Institution: University of Warwick Summary: 3 entries Repository used: WRAP |
1,633 | |||
| TOTAL | 183,299 | 5,175 | 21,567 | ||
At the time of writing we have to say that we do not know how many of the 183,299 items contain the full-text. All we can say is that there are at least 5,175 full-text items (or only 2.8%) – and this is based on the assumption that a full-text item represents the content of the metadata item, rather than for example, a PowerPoint slide used in the presentation of a paper.
An Opportunity for Developers
I should also like to point out that, as described on the DevCSI blog, the deadline for the Developer Challenge at Open Repositories 2011 (Austin, Texas) is Thursday 9 June. A CrowdVine page for the developer challenge describes how the Challenge is to “Show us the future of repositories“. Since “Remote presentations would be considered in exceptional circumstances” it strikes me that there might be an opportunity to submit an entry based on an analysis of the percentage of full-text items in repositories, but this would probably have to be done using an alternative approach. A suggestion for anyone who wold like to submit an based on this idea could be:
The future of repositories is to preserve the full text of research papers for future generations. We can see how well we are doing in implementing this vision which shows that xx% of repositories across the y sector already contain full-text items :-)
Or, if the results are disappointing:
The future of repositories is a gloomy one as only y% of repositories across the z sector contain full text items :-(
Alternatively we might conclude that new development is not required for those running ePrint repositories:
The future of repositories is reliant on the provision of evidence which can be used to policies and so ePrints repository managers should configure their services to provide the evidence describes in this post!
Is that an unreasonable suggestion?
Twitter conversation from Topsy: [View]











