How Do We Measure the Effectiveness of Institutional Repositories?
Posted by Brian Kelly on 24 February 2011
The Need for Metrics
How might one measure the effectiveness of an institutional repository? An approach which is arising from various activities I am involved in related to evidence, value and impact is based on the need to identify the underlying purpose(s) of services and to gather evidence related to how such purposes are being addressed.
Therefore there is a need to initially identify the purposes of an institutional repository. Institutions may have a variety of different purposes (which is why, although gathering evidence can be important, drawing up league tables is often inappropriate). But let’s suggest that two key purposes may be: (1) maximising access to research publications and (2) ensuring long-term preservation of research publications. What measures may be appropriate for ensuring such purposes are being achieved?
For maximising access to research publications two important measures will be the numbers of items in the repository and the numbers of accesses to the items. Since the numbers themselves will have little meaning in isolation there will be a need to measure trends over time, with an expectation of growth in the numbers of items deposited (which show slow down once legacy items have been uploaded and only new items are being deposited) and continual increase in overall the traffic to the repository as the number of items grows and access to the items via various resource discovery services provides easier ways of findings such resources.
Access Statistics for Institutional Repositories
The relevance of such statistics is well-understood with, here at the University of Bath, the IRStats module for the ePrints repository service providing access to information such as details of all downloads, the overall number of downloaded items (100,003 at the time of writing), the trends over time and various other summaries, as illustrated.
However it is important to recognise that such measures only indirectly provide an indication of how well a repository may be doing in maximising access to research publications. In part traffic may be generated by users following links to content of no interest to them through use of search engines such as Google (which is responsible for providing 38% of traffic to the University of Bath repository, with another 10.2% arriving via Google Scholar). In addition even if a relevant paper is found and read, the ideas it contains may not be felt to be of direct interest and may not be used to inform subsequent research activities.
A citation to a resource will provide more tangible evidence of direct benefits of a repository to supporting research activities and work such as the MESUR metrics activity is looking to “investigate an array of possible impact metrics that includes not only frequency-based metrics (citation and hit counts), but also network-based metrics such as those employed in social network analysis and web search engines“. However in this post I will focus on evidence which can be easily gleaned from repositories themselves.
Whilst it is possible to point out various limitations in using such metrics the danger is that we lose sight of the fact that they can still have a role to play in providing a proxy indicator of value. So although repository items which are found and downloaded may not be of interest or may not be used, other items will be relevant and inform, either directly or indirectly, other research work. We might therefore assert that an increase in traffic may also have a positive correlation with an increase in use.
The Numbers of Items in Repositories
Measuring the numbers and growth in numbers of items in a repository would seem to be less problematic than access statistics. This measurement can reflect the effectiveness of a repository’s aims in supporting the preservation of research publications, as publication are migrates from departmental Web sites or individual’s personal home pages to a centrally managed environment. The growth in the numbers of items should also, of course, help in enhancing access to the papers too.
Repositories may, however, only provide access to the metadata about a paper and not access to the paper itself. This may be due to a number of factors including copyright restrictions, (perceived) difficulties in uploading document or the unavailability of the documents.
There may also be a need to be able to differentiate between the total number of distinct items in a repository and the numbers of formats which may be made available. Storage of the original master format is often recommended for preservation purposes and if ease-of-reuse of the content may be required (e.g. merging together various papers and producing a table of contents can be much easier if the original files are available, rather than a series of PDFs which can be more difficult to manipulate.
Alternative formats for items may also help to enhance access for users of mobile devices or users with disabilities who may require assistive technologies to process repository items. This then leads to the question of not only the formats provided but how those formats are being used: is a PDF easily processed by assistive technology or is it simply a scanned image which cannot be read by voice browsers? In addition, as suggested by preliminary research carried out by my colleagues Emma Tonkin and Andy Hewson described in a post on “Automated Accessibility Analysis of PDFs in Repositories“, might the cover pages automatically generated by repository systems created additional barriers to access of such resources?
Trends Across the Community
This post has outlined areas in which evidence should be gathered and used in order to be able to help demonstrate the value of an institutional repository service and help to ensure that a number of best practices are being addressed (and, if not, to be able to develop plans for implementing such best practices).
Although such work should be done within the context of an individual repository service there are also benefits to be gained from observing trends across the community. My colleague Paul Walk recently mentioned on the JISC-Repositories JICMail list UKOLN development of a prototype harvesting and aggregation system for metadata from UK Institutional repositories called ‘RepUK’. One aspect of this work is aggregation of metadata records from institutional repositories and visualisation of various aspects of the data quality. Mark Dewey, lead developer for this work, has released an initial prototype tool. As can be seen this can provide a visualisation of the growth in the number of records across the 133 repositories which have been harvested.
This post has suggested that metrics are needed in order to help to provide answers, perhaps indirectly, to questions regarding the effectiveness of institutional repositories as well as to support and inform the development of the repositories and the adoption of best practices. Of course measuring the effectiveness of institutional repositories will also require user surveys, but this post only considers quantitative approaches which are summarised in the table below.
|Total usage||Provides an indication of repository’s effectiveness in enhancing access to research papers.||Data may need to be carefully interpretted.|
|Number of items||Provides an indication of repository’s effectiveness in both enhancing access to research papers and in ensuring their preservation.||It might be expected that growth with decrease after a backlog of papers have been uploaded.|
|Profiling Alternative Formats||May provide an indication that papers can be accessed by users with disabilities or my users using mobile devices.||Provision of multiple formats may enhance access and reuse.|
|Profiling Format Quality||Provides an indication that the formats provided are fit for purpose (e.g. PDFs are not just scanned images)||This may indicate problems with repository workflow, need for education, etc.|
But what additional tools may be needed (I would welcome a mobile app for my iPod Touch along the lines of the stats app for WordPress blogs)? What advice is needed in interpretting the findings (and avoiding misinterpretations?) Your thoughts are welcomed.
Twitter conversation from Topsy: [View]