UK Web Focus

Innovation and best practices for the Web

A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories

Posted by Brian Kelly (UK Web Focus) on 6 June 2011

Background

A recent post on How Do We Measure the Effectiveness of Institutional Repositories? sought to address the question of “What makes a good repository?” which was raised on the JISC-Repositories JISCMail list. The post outlined possible metrics which could be used for identifying the effectiveness of institutional repositories based on the intended purposes of a repository. In the post I suggested that if the purpose of a repository was to ensure the long-term preservation of resources, then there was a need to measure the number of full-text items in the repository – after all if the full text of a paper is not available the repository won’t be doing a very good job in the preservation of such resources!

The interest in this topic was revisited yesterday in a Twitter discussion which began with the suggestion from @PaulWalk that “I’ve thought we should use RepUK to measure actual persistence in repositories“‘. But in order to measure the persistence of of the actual resource we need to be able to differentiate between the persistence of the full-text item and the resource itself and not just the persistency of the URI of the item. How might one do this?

Initial Experimentation

Following a discussion with Les Carr at the JISC 2011 conference I discovered that the ePrints advanced search interface can be used to retrieve information on both the numbers of items containing the full text and those that do not. In order to see if this approach could be used I looked at UKOLN’s items in Opus, the University of Bath’s institutional repository. From this I found that there were a total of 344 items, of which 146 full text items were available (including published and confidential items) and 198 are metadata-only items. We can see that 42% of the items contain the full-text.

In order to see if this this use of ePrint’s advanced search could be used in a similar fashion for another repository I looked at the ECS ePrint Repository at the University of Southampton. This time I found that out of a total of 974 15,532  items the departmental repository contained 861 8,429 items with the full text and 113 7.093 metadata-only items – this time 54.3% of items contain the full-text.

But are these initial findings typical across the sector?

Survey Across Russell Group University Repositories

We might expect the 20 research-intensive Russell Group Universities to be playing a leading role in use of institutional repositories, with either institutional mandates (in the case of Southampton University) or institutional research culture helping to ensure that significant numbers of full-text items are deposited. But is this really the case? In order to investigate whether the approach described could be applied more widely the survey was carried out across Russell Group Universities.

Using the list of repositories taken from the OpenDOAR directory I found that 3 of the Russell group Universities seem to use the DSpace repository software and the advanced search functional in DSpace does not appear to allow searching to be restricted to full-text and metadata-only records.

Subsequent investigation of the advanced search capabilities of the remaining 17 institutions showed that only two seemed to provide the advanced search function which I used on the University of Bath and ECS, University of Southampton repositories. However there is a RESTful interface to the search and so the search parameters used to search the University of Bath repository was used across the other ePrint repositories. The following searches were carried out:

Query 1: Total Number of Items

http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

Query 2: Full text deposited (but access may be restricted)

http://eprint.domain/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=public&full_text_status=restricted&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

Query 3: No full text available:

http://eprint.domai/cgi/search/quicksearch?screen=Public%3A%3AEPrintSearch&basic_merge=ALL&basic=web&full_text_status=none&groups_merge=ALL&satisfyall=ALL&order=-date%2Fcreators_name%2Ftitle&_action_search=Search

It was intended to use the survey methodology across the Russell Group universities which host an institutional repository based on the ePrints software. However it was not possible to get valid results for most of the repositories and it was subsequently discovered that this is an optional feature for ePrints repositories.

Rather than abandon this work I have decided to publish this post in order to encourage institutions which host an ePrints repository to implement this feature since I feel it would be beneficial to the repository community if we had a better picture of how institutions are using repositories to host full-text items.

The table below gives the results of the two test cases (from Bath and Southampton) together with details of the total number of items in the other repositories. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly. [Note there was an error in the figures for the ECS repository. This has now been corrected in the table below.]

Ref. No. Institutional Repository Details Query 1: Total Nos. of Items Query 2: Total Nos. of Full text Items Query 3: Total Nos.
of Metadata-Only items
Percentage of Full-Text Items
A InstitutionUniversity of Bath
Repository used
: Opus Repository
Summary
: Uses ePrints.
20,210 1,387 18,823 6.86%
B InstitutionECS, University of Southampton
Repository used
: eprint Repository
Summary
: Uses ePrints.
974 15,532 861 8,439 113  7,093  11.6% 54.3%
TOTAL 21,184  35,742 2,248 9,826 18,936 25,916  10.6% 27.4%

The table below gives the results of the findings for what seems to be the main repository from Russell Group Universities. Note that the results were gathered using the public advanced search interface where this was available. If information on the numbers of full-text items becomes available I will update this post and annotate accordingly.

Ref. No. Institutional Repository Details Query 1: Total Nos.
of Items
Query 2: Total Nos. of
Full text Items
Query 3: Total Nos.
of Metadata-Only
items
Percentage of
Full-Text Items
1 Institution: University of Birmingham
Repository used: eprint Repository
Summary: Three entries. Uses ePrints.
411
2 Institution: University of Bristol
Summary: One entry. Uses DSpace
3 Institution: University of Cambridge
Summary: Four entries. Uses DSpace.
4 Institution: Cardiff University
Summary: 1 entry. Uses ePrints.
Repository used: ORCA
4,562
5 Institution: University of Edinburgh
Summary: Three entries. Uses DSpace.
6 Institution: University of Glasgow
Summary: Three entries. Uses ePrints.
Repository used: Enlighten
40,803
7 Institution: Imperial College
Repository used: Spiral
Summary: Type not known.
Not determined
8 Institution: King’s College London
Repository used: Department of
Computer Science E-Repository

Summary: One entry. Uses ePrints.
999
9 Institution: University of Leeds
Repository used: White Rose Research Online
Summary
: Uses ePrints. Shared by
Leeds, Sheffield and York.
8,013
10 Institution: University of Liverpool
Summary: One entry.
Repository used: Research Archive
698 641 57 93%
11 Institution: LSE
Summary: 2 entries.
Repository used: LSE Research Online
26,044 4,534 21,510 17.4%
12 Institution: University of Manchester
Summary: One entry.
Repository used: MMS
Not determined
13 Newcastle University
Summary: One entry.
Repository used: Newcastle Eprints
Not determined
14 Institution: University of Nottingham
Summary: One entry.
Repository used: Nottingham Eprints
781
15 Institution: University of Oxford
Summary: Five entries
Repository used
: ORA
Not determined
16 Institution: Queen’s University Belfast
Summary: One entry.
Repository used: Queen’s Papers
on Europeanisation & ConWEB
Not determined
17 Institution: University of Sheffield
Repository used: White Rose Research Online
Summary: See entry for Leeds.
8,013
18 Institution: University of Southampton
Summary: 11 entries.
Repository used: eprints.soton
60,438
19 Institution: University College London
Summary: 1 entry
Repository used: UCL Discovery
30,904
20 Institution: University of Warwick
Summary: 3 entries
Repository used: WRAP
1,633
TOTAL 183,299 5,175  21,567

At the time of writing we have to say that we do not know how many of the 183,299 items contain the full-text. All we can say is that there are at least 5,175 full-text items (or only 2.8%) – and this is based on the assumption that a full-text item represents the content of the metadata item, rather than for example, a PowerPoint slide used in the presentation of a paper.

An Opportunity for Developers

I should also like to point out that, as described on the DevCSI blog, the deadline for the Developer Challenge at Open Repositories 2011 (Austin, Texas) is Thursday 9 June. A CrowdVine page for the developer challenge describes how the Challenge is to “Show us the future of repositories“. Since “Remote presentations would be considered in exceptional circumstances” it strikes me that there might be an opportunity to submit an entry based on an analysis of the percentage of full-text items in repositories, but this would probably have to be done using an alternative approach. A suggestion for anyone who wold like to submit an based on this idea could be:

The future of repositories is to preserve the full text of research papers for future generations. We can see how well we are doing in implementing this vision which shows that xx% of repositories across the y sector already contain full-text items :-)

Or, if the results are disappointing:

The future of repositories is a gloomy one as only y% of repositories across the z sector contain full text items :-(

Alternatively we might conclude that new development is not required for those running ePrint repositories:

The future of repositories is reliant on the provision of evidence which can be used to policies and so ePrints repository managers should configure their services to provide the evidence describes in this post!

Is that an unreasonable suggestion?


Twitter conversation from Topsy: [View]

14 Responses to “A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories”

  1. Those eprint searches are searching the number of records containing the word ‘web’. Why not remove that weird restriction?

  2. Chris Rusbridge said

    Why not use the ROAR service, which gives you figures for the totla number of records, for your first column? It would at least fill it in! ROAR tracks daily uploads as well, so you can also get stats for average uploads (and also median, often very different, and inter-quartile range, which gives some idea of consistency).

    I suspect (but don’t know) that ROAR counts all the items in the repository, including dark items. This is because the ROAR counts for some repositories exceed the count available through a public inspection (eg DSpace @ Cambridge).

    • My initial aim was to make use of APIs in order to develop a more scaleable approach for carrying out this analysis. However I discovered that this wasn’t possible due to the lack of data of full-text items. I continued to use the ePrints search approach so that it will provide a consistent approach if further data becomes available.

  3. There is a problem with using % of full text items as a measure of the success of the repository. If the repository’s main purposes were to enable open access to research and the preservation of that research then it would be a very valid measure. However, in many institutions the tool for reporting on research (Via RAE and REF) is the repository. Inevitably, this means the repository will include many metadata only records. Even if it were completely successful in getting academics to add full text (a big if) it still would not have 100% full text records due to the restrictions of copyright, commercial sensitivity and confidentiality issues.

  4. I echo Jackie’s comments. Whilst 100% full text archives across the sector would be an ideal, surely a steadily growing mixed-economy repository, with academics actively and regularly self-archiving, is superior to an ailing full text only archive with few deposits that fails to represent accurately an institution’s research.

  5. Kara said

    I support Jackie’s email above regarding the evolving nature and purposes for repositories.

    In terms of full-text collection, I’m not sure that measuring full-text in a particular repository is entirely a good indicator (we’re not the BL or the Bodleian and we’ll never capture everything in the time we have).

    We’re trusting external sites (PMC, PLoS, BMC) and see our purpose being to enable access, rather than gather all items in our repository. We can capture links to reliable OA copies of full-text in a specific related urls field.

    I know this an essential element of institutional repositories (gathering all full-text) but we see it as an evolution of purpose. We need a few tweaks before using it for reporting full-text purposes. I might add that these links are a small portion of OA items for our repository, there’s still many items that aren’t otherwise openly available except via our repository activities.

    ps. can you clarify the second to last statement in the post please? ‘The future of repositories is reliant on the provision of evidence which can be used to policies and so ePrints repository managers should configure their services to provide the evidence describes in this post!’

  6. @Jackie, @Emma, @Kara
    Yes, I agree. I think in some quarters there is a strong belief that open access repositories *will* provide access to full-text. I feel there is a need to gather evidence in order to see if such assertions are true – and if this is not the case to develop a more nuanced view of the role of repositories – and you have provided some good examples of the way in which an IR will relate to a wider ecosystem. My view is that this should be based on evidence rather than assertion.

    @Emma “… a steadily growing mixed-economy repository, with academics actively and regularly self-archiving, is superior to an ailing full text only archive …” – I think that’s a very useful comment. I think this suggests that there is a need to monitor trends and not just a snapshot of the current state of play.

    @Kara regarding my last sentence – this was a suggestion for a submission to the OR11 developers challenge on the Future of Repositories – that the future should be evidence-based rather than advocacy-based. And if the evidence does reflect the advocate's view them there may be a need rto accept the real world!

  7. Liverpool has 93% full text but only 698 items. I presume that doesn’t make it successful? The point is, for both your measures of success/impact (i.e. preservation and open access) it is the number of publications where full text is preserved and/or openly available as a percentage of the actual number of publications that were generated by the institution in any given period that is the only really useful measure? Not what you are measuring here.

    • U used the OpenDOAR list of repositories and where there were several listed for an institution I tried to choose one which appeared to host institutional research content. I also selected e~prints repositories as these can (potentially) provide data on the number of full-text items. In several instances there were multiple repositories so I could have chosen a niche repository containing a small number of items. In the case of Liverpool University, however, only one repository was listed. The data I’ve provided therefore would seem to provide an accurate picture based on the information provided on the OpenDOAR list. I would agree with you, however, that the number of items are not on par with other institutions of similar size and status to Liverpool. Perhaps there is a need to relate such figures to the numbers of active researchers within the institution? Hmm, is such data available via DBPedia, I wonder? But before we get carried away with the potential of Linked Data we need to ensure that we have open data available from the institutional repositories!

  8. [...] A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories [...]

  9. [...] A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories [...]

  10. [...] my work has avoided addressing the complexities of metrics for research a recent survey entitled A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories has sought to profile the institutional repositories hosted by Russell Group Universities in order [...]

  11. [...] for research publications, code and data. Currently IRs are used as Green OA archives which achieve only limited success in providing free full-text access. But as Networked Repositories for Digital Open Access [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: