UK Web Focus

Innovation and best practices for the Web

Archive for November 14th, 2011

To What Extent Do Multiple Copies of Papers Affect Download Statistics?

Posted by Brian Kelly on 14 November 2011

Are Multiple Copies of Papers Bad For The Researcher?

If authors are encouraged to provide pre-prints of their papers in addition to the paper which is hosted at the publisher’s Web site, how might that affect the associated usage statistics?  If usage statistics are fragmented, how easy might it be to aggregate the statistics? And if doing this is difficult, does it matter?

This was a question I was asked recently.  In order to try and gain a better understanding of what the issues were I have analysed the usage statistics for the five most downloaded papers which I have uploaded to Opus, the University of Bath institutional repository.   This exercise helped me to understand that the issues is more complicated than I initially appreciated.  The data for my papers is summarised below.


Paper 1
Library 2.0: balancing the risks and benefits to maximise the dividends
Journal/Event Program Electronic Library & Information Systems, 43, 2009
Opus statistics 1,516
UKOLN Web site statistics 190 consisting of 14 (.doc files viewed in 2011) +129 (.doc files viewed in 2009) + 47 HTML file viewed in 2009)
Publisher’s information [Paper] – Usage statistics not available
Nos. of citations 8 Citations according to Google scholar
Other known copies There are 210 records listed in Google Scholar search which includes links to versions on Opus and the UKOLN Web site
Notes Two versions of paper published: Initial paper presented at Building Bridges 2009 conference. Paper subsequently republished in Program.
Paper 2
From Web Accessibility to Web Adaptability
Journal/Event Disability and Rehability: Assistive Technology, 4, 2009
Opus statistics 491
UKOLN Web site statistics 0 views
Publisher’s information [Paper] – Usage statistics not available
Nos. of citations 6 citations according to Google scholar
Other known copies David Sloan’s list of publications (PDF file available). There are 10 records listed on Google Scholar search which includes links to versions on Opus and the UKOLN Web site.
Notes This paper was embargoed and so was not released until 18 months after publication.
Paper 3
Implementing a Holistic Approach to E-Learning Accessibility
Journal/Event ALT-C, 2005
Opus statistics 409
UKOLN Web site statistics 4,021 views consisting of 295 (HTML views in 2011) + 557 (HTML views in 2010) + 592 (HTML views in 2009) + 1,009 (HTML views in 2008) +861 (HTML views in 2007) + 707 (HTML views in 2006) + 635 (HTML views in 2005)
Publisher’s information Not available on conference web site
Nos. of citations 20 citations according to Google scholar
Other known copies There are 8 records listed on Google Scholar search which includes links to versions on Opus and the UKOLN Web site together with a copy of the MS Word file hosted by MediaLT organisation in Norway.
Notes This paper was awarded the prize for Best Research Paper at the ALT-C 2005 conference.
Paper 4
Developing A Holistic Approach For E-Learning Accessibility
Journal/Event Canadian Journal of Learning and Technology, 30 (3), 2004
Opus statistics 404
UKOLN Web site statistics 498 views consisting of 188 (HTML views in 2011) + 310 (HTML views in 2010)
Publisher’s information [Publisher’s copy] – Usage statistics not available
Nos. of citations 36 citations according to Google scholar
Other known copies There are 6 records listed on a Google Scholar search which includes links to metadata records on Opus and the UKOLN Web site.
Notes This paper was available on the UKOLN Web site for a significant period of time.
Paper 5
Empowering users and their institutions: A risks and opportunities framework for exploiting the potential of the social web
Journal/Event CULTURAL HERITAGE online conference web site, 2009
Opus statistics 356
UKOLN Web site statistics 0 views
Publisher’s information [Publisher’s copy] – Usage statistics not available
Nos. of citations 1 citation according to Google scholar
Other known copies There are 3 records listed on a Google Scholar search (which has one link to a copy on the UKOLN Web site) and 12 on a second Google Scholar search which includes links copies on the conference Web site.
Notes This paper was not made available on the UKOLN Web site. The publisher’s copy consists of two large PDF file of all papers presented at the conference.  Also note that this was a recent paper, by which time it had been decided to only publicise the copy on the institutional repository.

In total there have been 3,176 views of these five papers from the institutional repository and 4,709 views from the UKOLN Web site. Reviewing this evidence it seems that copies which were provided on the UKOLN Web site in 2004 and 2005 have had significant numbers of downloads from the Web site, in excess, significantly in one case, the numbers of downloads in the Opus repository.

It should also be noted that, as described in a blog post entitled Scridb Seems to be Successful in Enhancing Access to Papers papers hosted on the Scribd document sharing service do seem to attract a very large number of downloads, as shown below.

Discussion

If download statistics are used to complement citation statistics in order to provide some indication of the value of research publications it would appear that there will be pressures to either ensure that content is hosted only in a single location of that download statistics from multiple repositories can be aggregated.

However it does not seem clear how one might aggregate usage statistics from a diversity of services. I have been able to publish the statistics for files hosted on the UKOLN repository as I have access to the usage statistics, but this is clearly not a scalable solution.  SImilarly for the papers I have described I have not been able to find any statistics for the copy hosted on the publisher’s site.

One might then conclude that the recommendation should be that research papers should only be hosted in a single location. But is this a realistic approach?  I have always been keen on maximising access to my papers. Initially this was done by hosting the papers on the UKOLN Web site, before the University of Bath provided an institutional repository.  Although the papers are now hosted on the repository, and this is now the preferred location, I am reluctant to delete the original copy since this may cause long-established links to the paper to break and thus  cause access problems for users following such links. Similarly I would be reluctant to stop co-authors hosting a copy of the paper on their own repository. Indeed, since I seek to make use of Creative Commons licences to encourage reuse where possible it would seem to go against the grain to try to control such reuse in order simply to enhance metrics.

This, it seems to me, is the crux of the matter.  If the aim of research papers is to have an impact and open access can enhance this goal, then surely we need to accept the fragmentation of resources, including research publications. Looking at the metrics for the papers listed above it does seem that where a paper is available from multiple locations this enhances the numbers of downloads and subsequent citations although I would welcome a more rigourous analysis.

However such speculations are based on a very small sample and very subjective opinions. In addition the analysis of the usage statistics for the UKOLN Web site seems surprising, with figures displayed primarily for the HTML versions of papers and not the MS Word and PDF versions. This may be due to the usage statistics package not displaying findings for resources for which here have only been a small number of downloads.   However if this is the case it seems to suggest the advantages of providing a research paper in HTML format as well as MS Word and PDF.

But how typical are these findings, I wonder?  And what do people think about the tensions between maximising access to papers by setting them free and being able to better understand their usage by providing papers in a more managed environment?

Posted in Evidence | 3 Comments »