UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

What Can We Learn From Download Statistics for Institutional Repositories?

Posted by Brian Kelly on 6 Jul 2011

Gathering Quantitative Evidence

I am involved in work on looking at ways in evidence-based approaches can make use of metrics in order to understand best practices and demonstrate impact. A series of surveys have been carried out which have sought to gather quantitative evidence of use of a variety of services and, by publishing the findings on this blog, have encouraged discussions about the approaches.

This work complements the report on Splashes and Ripples: Synthesizing the Evidence on the Impacts of Digital Resources carried out by the Oxford Internet Institute and described on their blog which focussed on “synthesizing the evidence available under the JISC digitisation and eContent programmes to better understand the patterns of usage of digitised collections in research and teaching“.

Although my work has avoided addressing the complexities of metrics for research a recent survey entitled A Pilot Survey of the Numbers of Full-Text Items in Institutional Repositories has sought to profile the institutional repositories hosted by Russell Group Universities in order to have a better understanding of patterns of usage related to deposits of full-text items which would appear to be of importance in a repository is to have a role to play in the digital preservation of research papers.

Surveys of the numbers of downloads of papers from a repository is clearly a flawed approach if one is attempting to determine the quality, impact and value of research. But are there other insights to be gained from examining download statistics for an institutional repository? This latest survey, which is being carried out a few days before a workshop on “Metrics and Social Web Services: Quantitative Evidence for their Use and Impact“, will seek to understand whether new insights can be gained from a lightweight survey of the most popular downloads from the University of Bath’s Opus institutional repository.

Survey of Downloads

The University of Bath’s institutional repository, which I’ll refer to by the name “Opus”, is, like many UK University repositories, based on the ePrints software. A stats module, IRStats, seems to be provided as standard with ePrints although, as discussed in a previous post, the data which is gathered can by configured by the repository manager.

Opus currently has a total of 136,347 downloads since its launch in 2005. Looking at the histogram of monthly downloads we can see a slow growth for five months after the launch and then a plateau. Zooming in on the graph we can see growth in the numbers of downloads taking place in October/November 2009 and 2010 – and we might reasonably expect a similar pattern to be repeated when the next academic year begins.

But who are the authors of the most downloaded papers and might we be able to discover and techniques which can help to ensure that papers are downloaded ?

Looking at the top ten downloaded authors we find that the conferences proceedings of the 11th International Conference on Non-conventional Materials and Technologies, NOCMAT 2009 is in the top place with 28,449 downloads – an order of magnitude more than the item in second place.

The next most popular item is The use of QR codes in Education: A getting started guide for academics (2,514 downloads) by Andy Ramsden, former head of the e-Learning Unit who used to work in the office down the corridor from me. Andy has two other paper in the top ten, related to his elearning interest in QR codes (1,161 downloads) and Twitter (805 downloads). I am in third place, with my paper on Library 2.0: balancing the risks and benefits to maximise the dividends having 1,419 downloads. The other most popular downloaded papers seem to be PhD theses, with the exception of my UKOLN colleague Alex Ball whose project report on Review of the State of the Art of the Digital Curation of Research Data. is in tenth place (with 745 downloads).

Is there a pattern emerging, I wonder, or are these just one-off examples. It would be interesting to see what the evidence from a wider profile of downloads may indicate. Looking at the top ten authors pages we find A. Ramsden has had 7,760 items downloaded; B. Kelly (6,758); A.D Brown (3,323); A. Ball (2,267); S. J. Culley (2,250); S. Deneulin (1,900); S. Abdullah (1,535); E. Dekoninck (1,469); E. W. Elias (1,469); J. Millar (1,439) and L. Jordan (1,161). [Note that the items do not seem to total correctly in all cases so I will omit the links until I’ve tried to resolve this].

Discussion

As mentioned previously it is important to note that downloads have relevance to quality – it would probably be timely now to point our that the numbers of readers of the News of the World demonstrate that quite clearly! However if we also acknowledge that researchers do have a responsibility to get their message across, then researchers will (should) have an interest in maximising the numbers of (appropriate) readers of their papers – and it is important, I feel, to highlight the need to engage with appropriate readers.

From the survey it seems that the authors who have papers in the top ten institutional downloads are also successful in having other papers also being downloaded in significant numbers. Perhaps having an office on level 5 of the Wessex House building may be a reason for such popularity of the papers! On the other hand it may be that the three of us who shared the same corridor discussed dissemination strategies or perhaps, and more likely, are simply working in an area (related to digital libraries) in which potential readers of our papers are more likely to access digital repositories.


Twitter conversation from Topsy: [View]

6 Responses to “What Can We Learn From Download Statistics for Institutional Repositories?”

  1. Steve Hitchcock said

    There is a suggestion in your commentary that there may be a degree of random chance in becoming one of the most downloaded papers from, in this case, the Opus repository. Or more likely you would argue that there may be a range of factors at work, but we don’t know which ones apply in specific cases. There is evidence elsewhere of a correlation between downloads and later citations, and we think we understand citations better than downloads, for now. A number of papers in my open access impact bibliography show this, beginning, I think, with the paper by my colleague Tim Brody http://eprints.ecs.soton.ac.uk/10713/

    So these download figures are meaningful, even if we are not yet sure what they mean. I like your speculations, about corridors and locations for example, but by asking the questions about high downloads I think you may find the answers are more profound.

    • Thanks for the comment. On further reflections I feel that the following factors are relevant.

      • The quality of the paper!
      • SEO factors – how Google-friendly the paper is (in comparison with other resources in the same repository)
      • SMO factors – how well the papers may be promoted across one’s professional social networks
      • The subject area (some papers are likely to be more ‘popular’ than others)
      • The size of the repository (it will be much easier to be in the top downloads in the number of full-text items are small)
      • Personal/departmental; factors – note that UKOLN has a long-standing engagement with repository developments and best practices for digital preservation so it is unsurprising that such a high proportion of our papers are available as full-text items in the repository.

      Thanks for the link to the paper on Earlier Web Usage Statistics as Predictors of Later Citation Impact – it is good to see evidence which suggests the correlation between downloads and citations.

      • Steve Hitchcock said

        Brian, I’m glad you put the quality factor top. It is clearly the most important. The next two are about raising awareness of papers, while your remaining factors are relative and somewhat specific in location.

        So, I’m interested in the first two types of factor. My experience as development editor with the JoDI e-journal a few years ago was that when we emailed announcements of new issues, the pattern of downloads – the relative numbers of downloads for papers in the issue – was established within hours and days, and then maintained going forward. In other words, there is something inherent in the papers. In terms of awareness, within the issue all papers are equal due to the same marketing. In terms of quality all papers are supposedly equal at the outset when all you have are a title and abstract. And yet patterns are established almost instantly, and if Tim Brody is correct, will be reinforced subsequently by citation effects. Bear in mind that these effects will be most marked for the most popular, most downloaded papers.

        If this is the case what matters most is making the paper available as widely as possible. IRs do this in principle through OA, but what they lack is immediate awareness of new papers. Unlike JoDI they lack the email announcement; unlike arXiv they lack the daily subject alerts of new papers that establish the patterns of downloads for new papers. I am certain that IRs are capable of releasing the immediate potential of the best papers to be downloaded and used more widely if we could provide that elusive instant alerting and awareness service.

      • Hi Steve

        It does seem to me that as well as the quality of the article, the reputation of the author could also be a factor; hence flagging personal factors in my final bullet point.

        Regarding the SEO factors, I wonder whether file formats affects the visibility of the resources and hence the numbers of downloads? For example, will a MS Word file be invisible to Google; will an HTML file be more ‘visible’ than an HTML resource (especially if a PDF cover page adversely influences Google ranking).

        Note for my most downloaded paper, Library 2.0: Balancing the Risks and Benefits to Maximise the Dividends, about half of the 1,273 downloads to date occured when it was deposited in August 2009. This seems to have been the result of a blog post published on 11 August 2009. This to me suggests that, in some cases, social media optimisation (SMO) approaches can enhance the numbers of downloads to a significant extent.

  2. Dorothea said

    Not sure “all papers are equal due to the same marketing,” honestly. How many paper authors have blogs? Twitter streams (probably not when Steve was editing JoDI, but almost certainly a factor today)? Journal marketing is not the only marketing available these days.

    Bias disclosure: A paper of mine is beyond doubt the top download from the IR I (for two more days) run. I’m hesitant to call quality the main or sole factor in its success. When I tossed the preprint up, I had a well-read blog.

  3. […] knew this would happen and I’m pleased to see that it has! Surveys of the numbers of downloads of papers from a […]

Leave a comment