UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Archive for the ‘Repositories’ Category

Should We Boycott

Posted by Brian Kelly on 9 Dec 2015

The “Why Are We Not Boycotting” Event

I recently came across a tweet which announced an event which addressed the question “Why Are We Not Boycotting“.  As described on the Eventbrite booking page:

With over 36 million visitors each month, the San Francisco-based platform-capitalist company is hugely popular with researchers. Its founder and CEO Richard Price maintains it is the ‘largest social-publishing network for scientists’, and ‘larger than all its competitors put together’. Yet posting on is far from being ethically and politically equivalent to using an institutional open access repository, which is how it is often understood by academics.’s financial rationale rests on the ability of the venture-capital-funded professional entrepreneurs who run it to monetize the data flows generated by researchers. can thus be seen to have a parasitical relationship to a public education system from which state funding is steadily being withdrawn. Its business model depends on academics largely educated and researching in the latter system, labouring for for free to help build its privately-owned for-profit platform by providing the aggregated input, data and attention value.

The abstract concluded by summarising questions which will be address at the event, including:

  • Why have researchers been so ready to campaign against for-profit academic publishers such as Elsevier, Springer, Wiley-Blackwell, and Taylor & Francis/Informa, but not against for-profit platforms such as, ResearchGate and Google Scholar?
  • Should academics refrain from providing free labour for these publishing companies too?
  • Are there non-profit alternatives to such commercial platforms academics should support instead?

The event was organised by The Centre for Disruptive Media and took place at Coventry University on 8 December 2015 from 3-6pm. Unfortunately I was not able to attend the event, but as this is an area of interest to me I thought I would publish this post, in which I argue that rather than boycotting we should make use it it (and similar services) by complementing institutional repository services with such services.

Background to My Interests

Slide on my use of academia.eduOver a year ago I was invited to give a talk on “Using Social Media to Build Your Academic Career” at a symposium in Brussels on “How to Build an Academic Career” for the five Flemish universities. Over the past two years I have also given modified versions of the talk  at the annual DAAD conference, the IRISS Research Unbound conference and for the iSchool@northumbria’s Research Seminar Series. As can be seen from the accompanying screenshot of one of the slides in the presentations I summarised the benefits which can be gained from making use of, based on personal experiences and recommending best practices.

My advice, as well as that provided by librarians and research support staff who promote use of social media by early career researchers, would appear to be in conflict with the general theme of the question “Why Are We Not Boycotting” event. But rather than address the issue of whether universities should own the online services they use I will present evidence of how existing services are being used and the implications of such usage patterns.

What Does The Evidence Suggest?

Personal Experiences

The slides for my a talk on “Using Social Media to Build Your Academic Career” are available on Slideshare. In the talk I described the benefits of making one’s research content available in popular places, rather than restricting access to niche web sites such as institutional repositories. In particular I described the SEO benefits which can be gained by using popular sites which contain links to research papers which are hosted on an institutional repository. This advice was based on findings published in a paper which asked “Can LinkedIn and Enhance Access to Open Repositories?” by myself and Jenny Delasalle and presented at the Open Repositories 2012 conference. I Googled for the paper using the search term “linkedin researchgate opus” in order to find the copy of the paper hosted on Opus, the University of Bath institutional repository; however the first hit was for the copy hosted by Researchgate. This suggested that hosting a research paper on a popular service such as or, in this case, Researchgate, would provide better discoverability for Google than use of an institutional repository.

But since Google will remember previous searches a more objective tool to use would be Duckduckgo, which does not keep a record of previous searches. In this case the search for “linkedin researchgate opus” found the paper hosted on Researchgate in second place. Using the full title of the paper, as shown the Duckduckgo search for “Can LinkedIn and Enhance Access to Open Repositories?” the order of the search results was (1) paper hosted by; (2) paper hosted by Opus instituional repository; (3) slides hosted on Slideshare and (4) paper hosted by Researchgate.

Personally. therefore, I have found benefits through use of and Researchgate in helping to raise the visibility of my research papers. But how popular in across the UK research sector?

Institutional Evidence

In order to answer this question a survey of use across the 24 Russell Group universities was carried out on 26 November 2015. The findings are given in the following table, with the link in the final column enabling the current results to be determined.

Ref. no. Institution No. of People Link
 1 University of Birmingham      5,408 [Link]
 2 University of Bristol      5,759 [Link]
3 University of Cambridge    12,770 [Link]
4 Cardiff University      5,372 [Link]
5 Durham University      5,198 [Link]
6 University of Exeter      5,346 [Link]
7 University of Edinburgh      9,252 [Link]
8 University of Glasgow      6,094 [Link]
9 Imperial College London      3,943 [Link]
10 King’s College London      8,568 [Link]
11 University of Leeds      8,396 [Link]
12 University of Liverpool      4,911 [Link]
13 London School of Economics       6,184 [Link]
14 University of Manchester     11,249 [Link]
15 Newcastle University     4,756 [Link]
16 University of Nottingham      7,963 [Link]
17 University of Oxford    19,709 [Link]
18 Queen Mary, University of London      4,083 [Link]
19 Queen’s University Belfast      2,639 [Link]
20 University of Sheffield      4,821 [Link]
21 University of Southampton       5,646 [Link]
22 University College London    13,481 [Link]
23 University of Warwick      6,457 [Link]
24 University of York      5,297 [Link]
Total 173,301


  • This information was collected on 9 December.
  • The figures were obtained by entering the name of the institution and using the highest number listed. As can be seen from the accompanying image there may be other variants of the name of the institution: the figure shown with therefore give an under-estimation of the number of items related to the institution (the total given in the table is for the largest variant of the institution’s name i.e. 4,744 in this example).

Note that a post entitled A Survey of Use of Researcher Profiling Services Across the 24 Russell Group Universities published in August 2012 summarised usage of several researcher profiling services  (Researchgate, ResearcherID, LinkedIn and Google Scholar Citations as well The survey found 33,812 users of from the Russell Group universities, which suggested that there has been an increase of nearly 400% in just over 3 years.

Also note that the findings of a survey carried out in February 2013, which compared take-up of and Researchgate described in a post entitled Profiling Use of Third-Party Research Repository Services found that Researchgate appeared to have entries for 426,414 researchers from Russell Group Universities, compared with 39,546 for


My personal experiences, together with the institutional evidence of suggest that the service is popular with the user community. But what of the issues raised at yesterday’s meeting?

Researchgate is a social networking-site

It seems to me that it will be difficult to find funding for the development of large-scale non-profit alternatives to commercial services such as and Researchgate. And even if funding to development and maintain the technical infrastructure was available, it may prove difficult to get researchers to see the benefits and  make their research content available on a new, unproven service, espcially in light of evidence such as that provided in a paper on “Open Access Meets Discoverability: Citations to Articles Posted to” which described how:

Based on a sample size of 34,940 papers, we find that a paper in a median impact factor journal uploaded to receives 41% more citations after one year than a similar article not available online,50% more citations after three years,and 73% after five years.

Coincidentally a week ago I came across a tweet by Jon Tenant which stated that:

Reminder: @ResearchGate and @academia are networking sites, not #openaccess repositories

As shown in the accompanying screenshot this tweet contained an image which highlighted some concerns regarding use of and Researchgate. However the first part of the tweet highlighted an important aspect of these services which are typically not provided by institutional repositories: @ResearchGate and @academia are networking sites.

It is worth expanding on this summary slightly, based n the evidence given above:

@ResearchGate and @academia are popular networking sites, with content likely to be more easily found using Google than content hosted on institutional repositories. 

In addition the services may also enhance the visibility of resources hosted on institutional repositories:

Providing links from @ResearchGate and @academia to content hosted on institutional repositories should prpvide SEO benefits, and make the content of institutional repositories more easily found using search engines such as Google. 

opus top author statistics for December 2015This was the conclusion based on a survey published in the paper which asked “Can LinkedIn and Enhance Access to Open Repositories?“. Revisiting personal experiences use of the University of Bath’s Opus repository usage service it can be seen that my papers are the most-viewed of all researchers (and, interestingly, my formed UKOLN colleagues Alex Ball and Emma Tonkin are to be found in the top 5 researchers based on download statistics).

This, of course, does not necessarily provide evidence of the quality of the papers; rather, as described in the paper cited above, it suggests that providing in-bound links from popular services will enhance the Google ranking of papers hosted by the repository.


Rather than developing open alternatives to and Researchgate my feeling is that the existing infrastructure of institutional repositories  services such as and Researchgate can be used in conjunction, with the institutional repository providing the robust and secure management of content, with researcher profiling services providing SEO benefits in addition to the community benefits these social networking services can provide for researchers.

Such use of multiple services will also help address the risk of cessation of services, which is often highlighted as a risk of use of commercial services where there is no formal contractual agreement. It should be noted, of course, that sectoral not-for-profit services may also be closed down, as has happened with the Jorum OER repository service, whose closure Jisc announced in June 2015.

Of course when researchers leave their host institution they may wish to ensure that they continue to have full read/write access to their publications, in which case storing copies of the papers in the commercial services themselves will provide continued access after they have left their host institution and can no longer manage their publications – this, incidentally, was the approach I took after leaving UKOLN, University of Bath in July 2013.

I’d be interested to hear your thoughts on the relevance of commercial research profiling/repository services, whether the sector should look into providing open alternatives and the strategies needed to ensure that such approaches would be successful.

View Twitter: [Topsy] – []


Posted in Evidence, Repositories | Leave a Comment »

Links From Wikipedia to Russell Group University Repositories

Posted by Brian Kelly on 28 Aug 2014

Wikipedia as the Front Matter to all Research

A session at the recent Wikimania conference provided an opportunity for discussion on the topics: “The fount of all knowledge – wikipedia as the front matter to all research“. The abstract describes how:

This discussion focuses on how Wikipedia could become the entry or discovery point to all significant research for the general public, and for scholars who are working just outside of the topic of interest. For most people, even researchers from closely related areas, summaries and explanations of a piece of research can be a crucial means both to discover and to begin to get into a new piece of research.

Currently overviews of research topics are supported through two mechanisms: reviews and “front matter” content. A review is a systematic summary of a field, written by an expert. These go out of date quickly, particularly in rapidly moving areas of research. Front matter is “News and Views” pieces, often found at the “front” of scientific journals that explain newly published research and put it in context. This often includes a discussion of explaining how the research is an important advance and its broader societal implications.

Both of these functions could easily be provided in a more up to date and scalable manner by tapping into a global community of experts. Wikipedia articles are often the top web search result for initial queries in many research areas and these articles are a major source of traffic for scientific journals. As the first port of call for many users of research and a significant discovery route the potential for Wikipedia as a form of dynamic, expertly curated “front matter” for the whole research literature is substantial. This facilitated discussion session will focus on how this role could be enhanced, what is currently missing and what risks exist in taking this route.

Reading this I wondered about the extent to which Wikipedia articles currently link to papers hosted in institutional repositories.

In order to explore this question I made use of Wikipedia’s External links search tool to monitor the number of links to from Wikipedia pages from to institutional repositories provided by the Russell Group universities.

The survey was carried out on 28 August 2014 using the service. Note that the current finding can be obtained by following the link in the final column.

Table 1: Numbers of Links to Wikipedia from Repositories Hosted at Russell Group Universities


Institutional Repository Details Nos. of links

from Wikipedia

View Results
1   2 [Link]
InstitutionUniversity of Bristol
Repository used: ROSE (
  6 [Link]
3  82  [Link]
InstitutionCardiff University
Repository usedORCA (
   1  [Link]
InstitutionUniversity of Durham
Repository usedDRO (
109  [Link]
6  55 [Link]
InstitutionUniversity of Exeter
 17 [Link]
InstitutionUniversity of Glasgow
120 [Link]
InstitutionImperial College
   5 [Link]
Repository used: King’s Research Portal (
  45 [Link]
InstitutionUniversity of Leeds
  65 [Link]
12    1 [Link]
 186 [Link]
14    74 [Link]
InstitutionNewcastle University
   4 [Link]
16   10 [Link]
InstitutionUniversity of Oxford
Repository usedORA (
   19 [Link]
Repository used: QMRO (
  15 [Link]
19     3 [Link]
Repository used: The University of Sheffield also uses the White Rose repository which is also used by Leeds and York. See the Leeds entry for the statistics.
 (65) [Link]
21  134 [Link]
22   98 [Link]
InstitutionUniversity of Warwick
Repository usedWRAP (
  57 [Link]
InstitutionUniversity of York
Repository used: The University of York uses the White Rose repository which is also used by Leeds and Sheffield. See the Leeds entry for the statistics.
  (65) [Link]
 Total 1,108


  • The URL of the repositories is taken from the OpenDOAR service.
  • Since the universities of Leeds, Sheffield and York share a repository the figures are provided in the entry for Leeds.
  • A number of institutions appear to host more than one research repository. In such cases the repository which appears to be the main research repository for the institution is used.


The Survey Methodology

It should be noted that this initial survey does note pretend to provide an answer to the question “How many research papers hosted by institutional repositories provided by Russell group universities are cited in Wikipedia articles?” Rather the survey reflects the use of this blog as an ‘open notebook’ in which the initial steps in gathering evidence are documented openly in order to solicit feedback on the methodology. This post also documents flaws and limitations in the methodology in order that others who may wish to use similar approaches are aware of the limitations. Possible ways in which such limitations can be addressed are given and feedback is welcomed.

In particular it should be noted that the search engine used in the survey covers all public pages on the Wikipedia web site and not just Wikipedia articles. It includes Talk pages and user profile pages.

In addition the repository web sites include a variety of resources and not just research papers; for example it was observed that some user profile pages for researchers provide links to their profile on their institutional repository.

It was also noticed that some of the files linked to from Wikipedia were listed in the search results as PDFs. Since it seems likely that PDFs referenced on Wikipedia which are hosted on institutional repositories will be research papers a more accurate reflection on the number of research papers which are cited in institutional repositories may be obtained by filtering the findings to include only PDF results.

In addition if the findings from the search tool were restricted to Wikimedia articles only (and omitted Talk pages, user profile pages, etc.) we should get a better understanding of the extent to which Wikipedia is being used as the “front matter” to research hosted in Russell group university institutional repositories.

If any Wikipedia developers would be interested in talking up this challenge, this could help to provide a more meaningful benchmark which could be useful in monitoring trends.

Policy Implications of Encouraging Wikipedia to Act as the Front Matter to Research

Links from Wikipedia to Instituoonal Repositories (pie chart)There are risks when gathering such data that observers with vested interests will seek to make too much of the findings if they suggest a league table, particularly if there seem to be runaway leaders.

However as can be seen from the accompanying pie chart in this case no single institutional repository has more than 17% of the total number of links (and remember that these figures are flawed due to the reasons summarised above).

However there will be interesting policy implications if universities agree with the suggestion that Wikipedia can act as “the front matter to all research”, especially if links from Wikipedia to the institution’s repository results in increased traffic to the repository. Another way of characterising the proposal would be to suggest that Wikipedia can act as “the marketing tool to an institution’s research outputs”.

This could easily lead to institutions failing to abide by Wikipedia’s core principles regarding providing content updates from a neutral point of view and a failure to abide by the Wikimedia Foundation’s terms of use.

Earlier today I came across an article entitled “So who’s editing the SNHU Wikipedia page?” which described how analysis of editing patterns and deviations from the norm may be indicative of inappropriate Wikipedia editing strategies, such as pay-for updates to institutional Wikipedia articles.

The article also pointed out how the PR sector has responded to criticisms that PR companies have been failing to abide by the Wikimedia Foundation’s terms of use: Top PR Firms Promise They Won’t Edit Clients’ Wikipedia Entries on the Sly. The article describes the Statement on Wikipedia from participating communications firms which is hosted on Wikipedia. The following statement was issued in 10 June 2014:

On behalf of our firms, we recognize Wikipedia’s unique and important role as a public knowledge resource. We also acknowledge that the prior actions of some in our industry have led to a challenging relationship with the community of Wikipedia editors.

Our firms believe that it is in the best interest of our industry, and Wikipedia users at large, that Wikipedia fulfill its mission of developing an accurate and objective online encyclopedia. Therefore, it is wise for communications professionals to follow Wikipedia policies as part of ethical engagement practices.

We therefore publicly state and commit, on behalf of our respective firms, to the best of our ability, to abide by the following principles:

  • To seek to better understand the fundamental principles guiding Wikipedia and other Wikimedia projects.
  • To act in accordance with Wikipedia’s policies and guidelines, particularly those related to “conflict of interest.”
  • To abide by the Wikimedia Foundation’s Terms of Use.
  • To the extent we become aware of potential violations of Wikipedia policies by our respective firms, to investigate the matter and seek corrective action, as appropriate and consistent with our policies.
  • Beyond our own firms, to take steps to publicize our views and counsel our clients and peers to conduct themselves accordingly.

We also seek opportunities for a productive and transparent dialogue with Wikipedia editors, inasmuch as we can provide accurate, up-to-date, and verifiable information that helps Wikipedia better achieve its goals.

A significant improvement in relations between our two communities may not occur quickly or easily, but it is our intention to do what we can to create a long-term positive change and contribute toward Wikipedia’s continued success.

If we wish to see Wikipedia acting as the front matter to research provided by the university sector should we be seeking to develop a similar statement on how we will do this whilst ensuring that we act in accordance with Wikipedia’s policies and guidelines? Of course the challenge would then be to identify what the appropriate best practices should be.

View Twitter conversations and metrics using: [Topsy] – []

Posted in Evidence, Repositories, Wikipedia | 2 Comments »

“Your SlideShare account has been suspended”

Posted by Brian Kelly on 1 Oct 2013

Loss of Access to Content Hosted on Slideshare

Slideshare account suspendedOn Wednesday 25 September 2013 I received an email message which informed me that my SlideShare account had been suspended.  The reason given for this was that:

SlideShare activity was flagged as inappropriate by our community. We looked into it and found at least one of your activities (i.e. uploads, comments, follows or favorites) to be in violation of SlideShare’s Terms of Service or Community Guidelines.

To make matters worse:

… your account lisbk has been suspended and marked for deletion.

I received the message at 9.50pm on Wednesday evening. The following morning I contacted the Slideshare Support Desk complaining about the loss of access to my slides (which meant that Web sites which had embedded the content contained a message saying the account had been suspended) and asking for the files to be restored. I received the following automated response:

Thank you for contacting SlideShare. This email is to confirm we have received your inquiry and will respond within one business day.

I failed to receive a reply so yesterday evening I submitted another message to the support desk. Twelve hours later I received a reply

Thank you for contacting us again about this issue. I sincerely apologize for the delay in getting back to you. It looks like the automated system has incorrectly marked your account. I have removed the suspension and your account should be working normally now. Thank you for your patience and understanding.

And now my Slideshare account has been restored. I was pleased when I found that not only had the 148 slidedecks had been restored, but the slides still had the usage statistics and my 315 followers.

Lessons Learnt

I’m pleased that my Slideshare account has been restored with seemingly no data lost. All that seems to have been lost is 5 days access to the 148 slide decks which I have uploaded to the service. But this incident also gives rise to some concerns. Why did this happen? Could it happen again? Did I make a mistake in setting up my Slideshare account almost 7 years ago (my oldest slides, entitled Web 2.0: Addressing Institutional Barriers, were used in a talk given at the ILI 2006 conference and uploaded to Slideshare on 13 October 2006)?

Back in 2008/9 I was the lead author of a paper entitled “Library 2.0: balancing the risks and benefits to maximise the dividends” . The abstract described how:

The paper acknowledges that there are a variety of risks associated with such approaches. The paper describes the different types of risks and outlines a risk assessment and risk management approach which is being developed to minimize the dangers whilst allowing the benefits of Library 2.0 to be realized.

The risks and opportunities frameworkThe risks and opportunities framework was subsequently developed further and later in 2009 in a paper entitled “Empowering Users and Institutions: A Risks and Opportunities Framework for Exploiting the Social Web” a diagram which depicted the framework was provided, as illustrated.

How might this have been applied in the specific context of use of Slideshare?

Intended use: Slideshare will be used to provide a copy of slides used in significant presentations so that (a) the slides can be embedded in blogs, web pages, etc; (b) comments on the slides can be given; (c) the slides can be accessed using a popular service in order to enhance access to the slides to help maximise the take-up of the ideas provided in the slides and (d) the slides can be ‘favourited’ in order to identify individuals with interests in the content.

Perceived benefits: Use of Slideshare  should help maximise access to the resources and provide commenting facilities which may be useful for reports on the impact of associated work.

Perceived risks: There may be risks that the Slideshare service is not sustainable and data lost. Spam comments may be made which would be time-consuming to delete. It was felt that the risks of loss of data was small since the Slideshare service appeared to be popular and sustainable.

Missed opportunities: Failing to use Slideshare would mean lost opportunities for reaching ou to a large number of users.

Costs: The free version of Slideshare has been used. The only additional costs have been the time taken in uploaded slides to the service and providing the relevant metadata.

Risk minimisation: The risks of data loss have been addressed by ensuring that the master copy of the slides is hosted on the UKOLN Web site.

Evidence base: The slide decks hosted on Slideshare have proved popular, with my three most popular slide decks having been viewed 24,536, 18,211 and 10,172 times. In addition a blog post entitled Evidence of Slideshare’s Impact highlighted the benefits of use of Slideshare for hosting slides for an event. It should be noted, however that a post on Understanding the Limits of Altmetrics: Slideshare Statistics did point out the need to treat these statistics with some caution.

I therefore feel that Slideshare has provided a valuable return on my investment. However just because Slideshare has proved useful in the past does not necessarily mean that this will continue to be true. Back in May 2012 TechCrunch announced that LinkedIn Acquires Professional Content Sharing Platform SlideShare For $119M. A concern might be that following the take-over there has been a lack of investment in the company, with asset-stripping of intellectual property, technical expertise, usage data  or other valuable assets taking place prior to the closure of the service or significant changes in its terms and conditions.

Quantcast stats for SlideshareHowever the usage figures provided by Quantast, available from the Techcrunch page about SlideShare, shows no cause for concerns. So perhaps my experience was a one-off glitch.  However the experience has led me to consider some additional risks which I hadn’t thought about previously:

Service makes mistakes: Although this mistake did not have any significant adverse affect, what would have happened if my account had been unavailable during a large event, such as IWMW events,  during which slides hosted on Slideshare are used during the event amplification?

Vexatious complaints: The automated email I received stated that my Slideshare content “was flagged as inappropriate by our community“. Could people submit anonymous complaints about content hosted on Slideshare, I wonder, leading to accounts being removed with an innocent Slideshare user having to make their case for the content to be be restored?

Contentious content: Slideshare’s Community Guidelines state: “Don’t post content or comments about issues like child exploitation, animal abuse, drug abuse, bomb making etc. They will be removed and your account will get suspended.” But what if a lecturer is giving a talk about, say, drug abuse? The guidelines do not seem to provide any scope for flexibility.

I’d welcome feedback on my experiences. I’d also like to invite Slideshare to respond to  the concerns I’ve raised. As I have said, I’ve been a longstanding fan of the service; I would hope that Slideshare’s support desk will be proactive in responding to concerns.

My Slideshare statisticsNOTE: Shortly after publishing this post I received an email from Slideshare containing a summary of the statistics of use of the service. As illustrated the figures provide an indication of significant levels of outreach for my slides (together with a small number of slides I have published on behalff of others). I hope that I can be reassured that Slideshare will continue to provide benefits for me and that I have my concerns addressed.

Posted in Repositories, Web2.0 | Tagged: | 15 Comments »

Jisc Feasibility Study on Digital Repository Infrastructure Solutions for ‘Unsupported’ Digital Assets

Posted by Brian Kelly on 18 Jul 2013

Jisc Feasibility Study on Identifying Digital Repository Infrastructure Solutions for Small-to-medium Digital Projects

Yesterday in a post entitled Linkage: Funding, licensing, managing research data, LTI, Google Analytics cohort analysis and more Martin Hawksey reflected on a Jisc call for a “Feasibility Study on digital repository infrastructure solutions for ‘unsupported’ digital assets“. The call document describes how:

The feasibility study is required to identify sustainable digital repository infrastructure solutions for digital assets from small-to-medium digital projects. These assets may originate from arts organisations, cultural heritage institutions, community groups and small organisations in the area of the arts, cultural heritage, medicine and science etc that may not have access to a sustainable digital repository infrastructure.  

The Jisc has invested large amounts of money on the development of a repository infrastructure for the sector. But what exactly do we mean by an institutional repository and what purposes do they serve? There’s a danger, I feel, in looking for answers to these questions from within the sector – the ‘echo chamber’ may well decide that  institutional repositories can provide the functions required by Jisc funding and, surprise, surprise, they may do that well!

Looking at the ‘institutional repository’ article in Wikipedia we find that:

An institutional repository is an online locus for collecting, preserving, and disseminating – in digital form – the intellectual output of an institution, particularly a research institution.

The article goes on to describe how:

The four main objectives for having an institutional repository are:

  1. to provide open access to institutional research output by self-archiving it;
  2. to create global visibility for an institution’s scholarly research;
  3. to collect content in a single location;
  4. to store and preserve other institutional digital assets, including unpublished or otherwise easily lost (“grey”) literature (e.g., theses or technical reports).

But how does an institutional repository differ from a content management system (CMS)? According to Wikipedia a CMS:

is a computer program that allows publishing, editing and modifying content as well as maintenance from a central interface. Such systems of content management provide procedures to manage workflow in a collaborative environment.

The article summarises the main features of a CMS:

The core function and use of content management systems is to present information on websites. CMS features vary widely from system to system. Simple systems showcase a handful of features, while other releases, notably enterprise systems, offer more complex and powerful functions. Most CMS include Web-based publishing, format management, revision control (version control), indexing, search, and retrieval. The CMS increments the version number when new updates are added to an already-existing file. 

Is the distinction clear? Not to me, especially as the list of the features of a CMS goes on to describe how:

A CMS may serve as a central repository containing documents, movies, pictures, phone numbers, scientific data. CMSs can be used for storing, controlling, revising, semantically enriching and publishing documentation.

So perhaps, rather than exploring traditional repository software the feasibility study should explore possible CMS solutions.

The External Repository

Researchgate scoresBut must an institutional repository be hosted within the institution? After all if the point is to “to create global visibility for an institution’s scholarly research” might not that be achieved by using a popular Cloud-based repository service? The ‘Google juice’ which such services can provide will help address the limited Google juice available for institutional services (especially smaller institutions) which will have relatively small numbers of inbound links. Indeed as described in a paper which asked “Can LinkedIn and Enhance Access to Open Repositories?” even prestigious Russell Group universities do not have the Google ranking provided by services which are used globally.

Yesterday I received an email which suggested that such services are already being widely used. The message, from ResearchGate, began:

Brian, we’ve put together stats for thousands of institutions based on ResearchGate members.

Following the link from the email message I found a list of the UK institutions with the highest ResearchGate scores. (as illustrated). The ResearchGate Score incidentallymeasures reputation and impact based on how a researcher’s work is received by their peers. This list shows institutions by the sum of the RG Scores of their individual members using ResearchGate“. But rather than being sidetracked by a discussion about what such scores mean, for me the more relevant question is whether third-party repository services such as ResearchGate and have a role to play for small institutions which do not have the technical expertise to manage a conventional institutional repository service.

The Challenges Facing Small Projects and Unsupported Digital Assets

The development culture in large, well-funded research-led institutions has encouraged the use of in-house solutions, often based on open source software solutions. But in light of funding difficulties in the sector and the growing maturity of Cloud-based solutions it might be relevant, especially for small to medium sized projects, to consider such solutions. Clearly there will be a need to consider the sustainability of such services and possible changes to terms and conditions. But such issues can be addressed. There is also a need to consider the sustainability of in-house solutions – an issue which is very relevant to the services and content provided by UKOLN in light of the significant downsizing the organisation will face in less than two weeks time!

The Invitation to tender document document describes how:

The candidate models and subsequent options/ recommendations will need to take into account the level of digital skills capabilities and capacity of small-to-medium projects. Options will need to be easy to implement for non-specialists            contributors to any potential solution(s).

Ironically the document itself illustrates a lack of skills in best practices for using Microsoft Word. As illustrated below the author of the document created a line feed at the end of each line, rather than using MS Word line wrap for sentences. This will result in ugly line breaks if the font face or size if changed in the document and could cause problems if the document was converted into other formats. I also noticed the MS Word styles had not been used, which means that the document has no logical structure. As well as meaning that automated table of contents could not be provided this can also cause accessibility problems for people who use screen readers.

MS Word for Jisc ITT Document

Put simply, the document itself illustrates the challenges which may well be faced when content creators have limited digital skills capabilities.

I welcome this call as it encourages solutions which are applicable to the real world environment in which content providers may well create content which is not amenable for processing in the ways which the systems designers may have expected.

Posted in Repositories | 1 Comment »

Developing the Repository Manager Community

Posted by Brian Kelly on 11 Jul 2013

OR 2013 paperIn a recent post on “SEO Analysis of Institutional Repositories: What’s the Back Story?” I summarised a paper which had been accepted for the Open Repositories 2013 conference. In addition to that paper, which was presented as a poster, a paper on “Developing the Repository Manager Community” was also accepted.

As described in the abstract:

This paper describes activities which have taken place within the UK institutional repository (IR) sector focusing on developing a community of practice through the sharing of experiences and best practice. This includes work done by the UK Council of Research Repositories (UKCoRR) and other bodies, together with informal activities, such as sharing the experience of organising Open Access Week events. The paper also considers future work to be undertaken by UKCoRR to continue developing the community.

Although I had two papers accepted for the conference in light of the costs of travel to Prince Edward Island, the venue for the OR 2013 conference, I did not feel I could justify travelling to the conference three weeks before being made redundant. However Yvonne Budden, my co-author, will be presenting the paper later today.

I feel that the work described in the paper on the growth of an community of practice will become of greater importance in light of changes in the Jisc and their moves away from community-building through the funding of projects in areas such as institutional repositories. There will therefore be a need for bottom-up approaches to sustainable community-building, as described in the paper.

The paper is available in PDF format from the ResearchGate repository. In addition the slides used by Yvonne Budden in the presentation are available on Slideshare and embedded below.

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in Repositories | 1 Comment »

SEO Analysis of Institutional Repositories: What’s the Back Story?

Posted by Brian Kelly on 8 Jul 2013

OR 2013 posterOver a year ago, following a staff appraisal, I agreed to become more involved in supporting the development of institutional repositories – an important area of work for UKOLN and JISC.

Following that decision I was successful in being the lead author for two papers which were accepted at the Open Repositories 2012 conference: one, with Jenny Delasalle, which asked “Can LinkedIn and Enhance Access to Open Repositories?” and another, with Nick Sheppard, Jenny Delasalle, Mark Dewey, Owen Stephens, Gareth Johnson and Stephanie Taylor, on “Open Metrics for Open Repositories“.

The former paper concluded by stating that “Further work is planned to investigate whether such links are responsible for enhancing SEO rankings of resources hosted in institutional repositories“.

This follow-up work was carried out during autumn 2012 and the findings published in a series of guest blog posts during Open Access Week 2012. The paper which summarised this work and the findings was accepted by the programme committee for the OR 2013 conference and will be presented in a poster display at the OR 2013 conference which takes place this week. The poster is included in this post.

The paper, which is available for Opus, the University of Bath repository, in MS Word and PDF formats, concluded:

There is a pressing need to gain a better understanding of the SEO characteristics of current repository services in order to identify examples of best practices and flawed approaches. However since local factors are likely to impact the visibility to search engines of content hosted in institutional repositories it will be important to ensure that such local factors are understood. The work described in this paper describes a methodology for sharing institutional findings in order to inform practices across the repository community. We therefore invite other repository managers to work in a similar fashion, critique the methodology and tools we have described and share the findings for their repository.

I’d therefore invite repository managers to provide an SEO analysis for their local repository – and if you would like to publish your findings as a guest blog post, to follow on from the guest posts in which William Nixon, Yvonne Budden and Natalia Madjarevic reported on the findings at Glasgow University, Warwick University and LSE, feel free to get in touch.

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in Repositories | 3 Comments »

SEO Analysis of Institutional Repositories: What’s the Back Story?

Posted by Brian Kelly on 8 Jul 2013

OR 2013 posterOver a year ago, following a staff appraisal, I agreed to become more involved in supporting the development of institutional repositories – an important area of work for UKOLN and JISC.

Following that decision I was successful in being the lead author for two papers which were accepted at the Open Repositories 2012 conference: one, with Jenny Delasalle, which asked “Can LinkedIn and Enhance Access to Open Repositories?” and another, with Nick Sheppard, Jenny Delasalle, Mark Dewey, Owen Stephens, Gareth Johnson and Stephanie Taylor, on “Open Metrics for Open Repositories“.

The former paper concluded by stating that “Further work is planned to investigate whether such links are responsible for enhancing SEO rankings of resources hosted in institutional repositories“.

This follow-up work was carried out during autumn 2012 and the findings published in a series of guest blog posts during Open Access Week 2012. The paper which summarised this work and the findings was accepted by the programme committee for the OR 2013 conference and will be presented in a poster display at the OR 2013 conference which takes place this week. The poster is included in this post.

The paper, which is available for Opus, the University of Bath repository, in MS Word and PDF formats, concluded:

There is a pressing need to gain a better understanding of the SEO characteristics of current repository services in order to identify examples of best practices and flawed approaches. However since local factors are likely to impact the visibility to search engines of content hosted in institutional repositories it will be important to ensure that such local factors are understood. The work described in this paper describes a methodology for sharing institutional findings in order to inform practices across the repository community. We therefore invite other repository managers to work in a similar fashion, critique the methodology and tools we have described and share the findings for their repository.

I’d therefore invite repository managers to provide an SEO analysis for their local repository – and if you would like to publish your findings as a guest blog post, to follow on from the guest posts in which William Nixon, Yvonne Budden and Natalia Madjarevic reported on the findings at Glasgow University, Warwick University and LSE, feel free to get in touch.

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in Repositories | 3 Comments »

UKOLN Wins Award For Best Paper At International Conference

Posted by Brian Kelly on 17 Jun 2013

Award-Winning Paper

ELPub 2013 paperOn Friday I received an email message from my colleague Stephanie Taylor which informed myself and my colleagues at UKOLN that a paper on “Cover sheets considered harmful” (PDF format) by Emma Tonkin, Stephanie and Greg Tourte had won an award for the best paper at the ElPub 2013 conference.

I’d like to echo the sentiments expressed by others in giving my congratulations to my UKOLN colleagues Emma and Stephanie and former UKOLN colleague Greg.

This award was quite timely as the prize was awarded less than two months before Jisc”s core funding of UKOLN ceases and the majority of staff are made redundant. The award is therefore very timely, as it provides a very relevant addition to Emma and Stephanie’s CVs.

The award follows on from a number of other papers which have received recognition at international conferences. including a paper by myself on “Implementing A Holistic Approach To E-Learning Accessibility” which was judged to be the Best Research Paper at the international ALT-C 2005 conference; another on “Developing countries; developing experiences: approaches to accessibility for the Real World” which presented with the John M Slatin Award for the Best Communications paper at the W4A 2010 conference and one “Strategies for the Curation of CAD Engineering Models” by my colleagues Manjula Patel and Alex Ball which was awarded a prize for the best peer-reviewed paper at the IDCC 2008 conference. All of the papers I have mentioned were co-authored with research colleagues based in other institutions but in all cases the lead author was based at UKOLN.

It will be a loss to the sector when this expertise becomes lost. I would also add that it is not just the expertise possessed by individuals, but also the synergies provided by researchers working closely with colleagues who may be focussed on project work or user engagement and dissemination activities.  The loss of this proven level of research expertise will place a particular challenge for Jisc staff who are now promoting themselves not so much as a funder of innovative IT developments, but as expertise themselves. As the recently relaunched Jisc Web site states in unambiguous terms:

We are the UK’s expert on digital technologies for education and research

At a time in which there are increasing expectations in the higher education sector that assertions will be back by evidence it would be helpful to hear more about the background to this assertion!

Cover sheets considered harmful

But what of the paper which suggested that “Cover sheets [are] considered harmful“?

Back in July 2010 in a post on “Automated Accessibility Analysis of PDFs in Repositories” I mentioned a paper on “From Web Accessibility to Web Adaptability” (available in PDF and HTML formats) in which I suggested that institutions should:

run automated audits on the content of [PDF resources in] the repositories. Such audits can produce valuable metadata with respect to resources and resource components and, for example, evaluate the level of use of best practices, such as the provision of structured headings, tagged images, tagged languages, conformance with the PDF standard, etc. Such evidence could be valuable in identifying problems which may need to be addressed in training or in fixing broken workflow processes.”

My colleague Emma Tonkin picked up on that idea as it related to the JISC-funded FixRep project she was working on which “aims to examine existing techniques and implementations for automated formal metadata extraction, within the framework of existing toolsets and services provided by the JISC Information Environment and elsewhere“.  Since there were clear overlaps between metadata for resource discovery and metadata (or, indeed, data) to enhance access to resources Emma and I discussed ways in which the FixReport project work could monitor the accessibility of resources hosted in institutional repositories.  Their initial findings were published in a paper on“Supporting PDF accessibility evaluation: Early results from the FixRep project“. This paper was accepted by the “2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010)” which was held in Greece on 25-28 May 2010. A Slidecast (slides with accompanying audio) are available on Slideshare and embedded below.

Emma concluded in the presentation “We may be ‘shooting ourselves in the foot’ with additions like after-the-fact cover sheets. This may remove original metadata that could have been utilised for machine learning.

Fast-forward to 2013 and earlier this year Emma, Stephanie and Greg revisited the question  as to whether repository managers are shooting themselves in the foot in the ways in which cover sheets are provided. Again we saw the benefits of the synergies across UKOLN staff with a diversity of interests. In a post entitled “Why I’m Now Embedding ORCID Metadata in PDFs” I described the benefits which researchers can gain from embedding their ORCID researcher ID in the PDFs of their peer-reviewed papers. However the post referenced another post on “Reflections on the Discussion on the Quality of Embedded Metadata in PDFs” which reported on problems caused by repository workflow processes which meant that ORCID IDs (and other embedded metadata) were being lost.

Although this was a bug which has subsequently been fixed, Emma, Stephanie and Greg had an interest in addressing broader issues, including assumptions that end users value cover pages since they provide information on the origin of the paper. Is this really the case, or does the motivation for providing cover pages come primarily for institutions which wish to see papers in their repositories branded? In addition they sought answers to the question of the mechanisms used for creating cover pages and whether such processes were interoperable with other requirements, such as text mining.

In response to a survey it seems that branding is the main motivation for use of cover sheets, closely followed by clarification of documents’ governance. However the need to support text mining and to maximise the benefits of indexing by search engines such as Google were identified by only one respondent.

The paper concluded by acknowledging the challenges faced by repository managers:

repository managers don’t make the rules; the repository manager is tasked with identifying and applying an appropriate compromise between the concerns of the different stakeholder groups involved, which is not a trivial undertaking.

I agree. We don’t need ‘experts’ who know whether cover sheets are desirable or not; rather we need experts who know about the potential benefits of cover sheets but also their limitations; we need software developers who know about the implementation of workflow processes which support both end users and other automated systems; we need researchers who can survey stakeholders and provide relevant statistical analyses of the findings and we need people with skills in supporting communities.

This award-winning paper was valuable because of the ways  in which it gathered inputs from a variety of sources, synthesised the findings and provided a series of achievable recommendations. But what about the implementation challenges? Next week at IWMW 2013 Stephanie Taylor together with Nick Sheppard, Leeds Metropolitan University are facilitating a session on The Institutional Web Site and the Institutional Repository: Addressing Challenges of Integration. This might possibly provide an opportunity for exploring practices for integrating repositories with Google. If you’d like to attend this session, please book quickly (and note that day tickets for the IWMW 2013 event are available).




Posted in Repositories | Leave a Comment »

Naming Conventions For Institutional Repositories: Lessons from CORE

Posted by Brian Kelly on 21 Feb 2013

The CORE (COnnecting REpositories) Project

Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to “facilitate free access to scholarly publications distributed across many systems“. The CORE Web site, which was developed at the Open University, provides access to four applications including:

Repository Analytics – A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.

I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.

CORE projectThe first four examples illustrate the difficulties I had in using the information. The first entry, for the Aberdeen University Research Archive, gives a clear indication of the host institution. The second example, Abertay Research Collections, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, Access to Research Resources for Teachers and Advanced Knowledge Technologies EPrints Archive, give no clue as to the host institution.

This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for “Southampton” enabled me to find eCrystals – Southampton and Electronics & Computer Science EPrints Service – University of Southampton – but not the main repository which has the name e-Prints Soton.

Using CORE to Search for Russell Group University Repositories

Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.

(search string)
Repository Metadata
1 Birmingham University of Birmingham
Research Archive, E-papers Repository
    937     928  103
University of Birmingham
Research Archive, E-prints Repository
    828     802   766
University of Birmingham
Research Archive, E-theses Repository
  2,559   2,513 2,133
2 Bristol Bristol Repository of Scholarly Eprints    –        4   –
3 Cambridge Computer Laboratory Technical Reports
– Cambridge University
  3,252      520   440
DSpace @ Cambridge 216,718 192,129 2,847
4 Cardiff Online Research @ Cardiff    31,274     1,647 1,555
5 Durham Durham e-Theses     4,483    4,411 4,051
Durham Research Online     9,062    2,922 2,856
6 Exeter Exeter Research and Institutional Content archive     2,547    2,334      4
7 Edinburgh Edinburgh DataShare         75       75   –
Edinburgh Research Archive     5,769   5,395 1,583
8 Glasgow Glasgow DSpace Service    –   –   –
Glasgow Theses Service     2,682    2,683 2,356
9 Imperial Spiral – Imperial College Digital Repository     8,097    8,094       4
10 King’s College London
(also used King’s and Kings)
None found    –   –   –
11 Leeds leedsmet open search (Incorrect institution)    (-)    (-)    (-)
Leodis – A photographic archive of Leeds     57,998   57,998    –
12 Liverpool Liverpool John Moores University Research Archive
(Incorrect institution)
     (-)    (-)    (-)
University of Liverpool Research Archive       885     810   517
13 LSE LSE Research Online   33,959   6,520 6,463
LSE Theses Online       454     454   424
14 Manchester e-space at Manchester Metropolitan University
 (Incorrect institution)
  (-)    (-)   (-)
Manchester eScholar Services  119,854 119,854   –
15 Newcastle Newcastle University E-Prints    –   –   –
16 Nottingham Nottingham ePrints      1,084    1,026   990
Nottingham eTheses      1,843    1,793 1,757
17 Oxford Oxford University Research Archive    16,215    3,745     98
18 Queen Mary None found
19 Queen’s University Belfast None found    –   –   –
20 Sheffield Sheffield Hallam University Research Archive
(Incorrect institution)
    (-)   (-)   (-)
21 Southampton eCrystals – Southampton      602     602   –
Electronics & Computer Science EPrints Service –
University of Southampton
 15,835    8,947 7,071
22 UCL UCL Discovery          0 245,407       2
23 Warwick EPrints at the Centre for Scientific Computing,
University of Warwick
   –  –    360
Warwick Research Archives Portal Repository    49,469     7,696  7,025
24 York York St John University ArchivalWare Digital Library
(Incorrect institution)
       331          1   –

Note that the Repository Analytics page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.


It is difficult to interpret the data given in the table: the entry for the UCL Discovery repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!

However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (“University of Birmingham Research Archive, E-prints Repository“, “University of Liverpool Research Archive” and “LSE Research Online”) or end of the repository name (“EPrints at the Centre for Scientific Computing, University of Warwick“, “Electronics & Computer Science EPrints Service – University of Southampton” and “Computer Laboratory Technical Reports – Cambridge University“) together with a large number of examples which use a partial form of the institution’s name (e.g. “Edinburgh Research Archive”, “Glasgow DSpace Service” and “Manchester eScholar Services“).

But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as “e-Prints Soton” and the “White Rose E-theses Online” and “White Rose Research Online” repositories which are used by the universities of Leeds, York and Sheffield.

Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution’s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.

In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:

Institutional repositories should contain the name of the host institution.

In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:

Access to Research Resources for Teachers – Department of Computer Science E-Repository – Enlighten – Modern Languages Publications Archive – Online Publications Store – Open Research Online – Pharmacy Eprints

If you are unfamiliar with these repositories, would you to able to guess who owns them?

Or, to put it another way, meaningful metadata is important for repositories!

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in openness, Repositories | 10 Comments »

Profiling Use of Third-Party Research Repository Services

Posted by Brian Kelly on 12 Feb 2013


How significant is use of third-party repository services?

How significant is use of third-party repository services?

In a recent post I explained Why I’m Evaluating ResearchGate. In the post I summarised the reasons why I felt that could provide an additional service for depositing research papers which would complement Opus, the University of Bath institutional repository. But what others services might also be relevant? And which services are hosting the largest numbers of research papers?

In order to seek answers to these questions, I used Google to provide a measure of the size of a number of hosting services for PDFs and the number of PDFs they host. The services I analysed were:

  • This site is described in Wikipedia as “a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. The site has been described as a mash-up of “Facebook, Twitter and LinkedIn” that includes “profile pages, comments, groups, job listings, and ‘like’ and ‘follow’ buttons”. Members are encouraged to share raw data and failed experiment results as well as successes, in order to avoid repeating their peers’ scientific research mistakes.
  • This site is described in Wikipedia as “a platform for academics to share research papers. It was launched in September 2008. Currently the site is approaching 2 million registered users.[2] The platform can be used to share papers, monitor their impact, and follow the research in a particular field.
  • Thus site is described in Wikipedia as “a desktop and web program for managing and sharing research papers,[2] discovering research data and collaborating online. It combines Mendeley Desktop, a PDF and reference management application (available for Windows, Mac and Linux) with Mendeley Web, an online social network for researchers.[3][4][5] Mendeley requires the user to store all basic citation data on its servers – storing copies of documents is at the user’s discretion“.
  • This site is described in Wikipedia as “based on the principle of social bookmarking [the service] is aimed to promote and to develop the sharing of scientific references amongst researchers. In the same way that it is possible to catalog web pages (with Furl and or photographs (with Flickr), scientists can share information on academic papers with specific tools (like CiteULike) developed for that purpose“.
  • This site is described in Wikipedia as “a document-sharing website that allows users to post documents of various formats, and embed them into a web page using its iPaper format“.

Many researchers will probably be familiar with the first four services listed. The fifth service,, is included in order to explore whether a general-purpose PDF repository service could have a role to play in supporting the sharing of research publications.

Findings for the Coverage of the Services

Google was used in order to provide an estimate of the coverage of the services, including the total number of resources which have been indexed by Google and the number of PDF files. The findings are given in the following table. Note that the figures were initially collected on 6 February 2013. In order to check the volatility of the findings the searches were repeated on 11 February.

Search for Search Term Nos. of results Date
Total number of resources 55,300,000   6 Feb 2013
56,100,000 11 Feb 2013
Total number of PDF files filetype:pdf   2,980,000   6 Feb 2013
  2,910,000 11 Feb 2013
Total number of resources 12,500,000   6 Feb 2013
 12,400,000 11 Feb 2013
Total number of PDF files filetype:pdf           4,930   6 Feb 2013
         4,740 11 Feb 2013
Total number of resources   3,310,000   6 Feb 2013
  3,150,000 11 Feb 2013
Total number of PDF files filetype:pdf          3,840   6 Feb 2013
         4,020 11 Feb 2013
Total number of resources  35,600,000   6 Feb 2013
 35,700,000 11 Feb 2013
Total number of PDF files filetype:pdf              244   6 Feb 2013
               30 11 Feb 2013
Total number of resources   61,300,000   6 Feb 2013
166,000,000 11 Feb 2013
Total number of PDF files filetype:pdf                  – 6 Feb 2013
371,000,000 11 Feb 2013
Total number of resources 10,300,000   6 Feb 2013
26,100,000 11 Feb 2013
Total number of PDF files filetype:pdf        48,800   6 Feb 2013
       48,800 11 Feb 2013

It seems that Scribd hosts a very large number of resources (although a finding of 3 PDF resources originally found was discarded as the results seemed to be unreliable).

However since Scribd is a general purpose repository service, it was felt that ResearchGate provides a repository of a large number of PDFs resources which are more relevant for researchers. In light of this confirmation of the popularity of Researchgate an additional survey was carried out which reported on use of the service across Russell Group universities.

Findings for Institutional Use of and Researchgate

On 1 August 2012 a Survey of Use of Researcher Profiling Services Across the 24 Russell Group Universities was published on this blog. This survey has been repeated in order to detect changes in the use of ResearchGate. Since the original survey also provided an analysis of, this was also included in the current survey. The results are given in the following table. Note that the data is also available in Google Spreadsheets.

Institution (members) ResearchGate
Aug 2012 Feb 2013
Members Publications
Aug 2012 Feb 2013* Members Publications
1 University of Birmingham 1,210 1,562  782 19,515 1,439 22,068
2 University of Bristol  1,018  1,189   641 21,249  1,251 
3 University of Cambridge  3,020  3,439   972 39,713 1,699 42,419
4 Cardiff University     906  1,071   646   9,596 1,272 10,696
5 Durham University  1,001 1,189  273  1,151    662   7,152
6 University of Exeter    919 1,106   269  5,150   652   6,191
7 University of Edinburgh  2,079 2,479
1,181 25,918 2,065 28,486
8 University of Glasgow 1,004
 1,212    613 20,041 1,224 21,733
9 Imperial College    798     896 1,096 30,404 1,377 34,202
10 King’s College London 1,420  1,748 1,406 18,264 2,241 23,391
11 University of Leeds 1,657  1,871    848  16,944 1,455
12 University of Liverpool   866     989   582  16,475 1,146 18,749
13 London School of Economics 1,131  1,354    191    1,838    407   2,449
14 University of Manchester 2,279  2,590 1,113  25,139 2,188 29,675
15 Newcastle University    906  1,039    704  17,307 1,348 17,376
16 University of Nottingham 1,299        1,529    970  20,513 1,559 20,145
17 University of Oxford 3,842        4,469 1,221  38,224 1,967 39,861
18 Queen Mary    715           849   228    5,232    898
19 Queen’s University Belfast    689           774   479 10,750    864 11,699
20 University of Sheffield  1,082        1,235   823 18,127  1,659 20,149
21 University of Southampton  1,083        1,265   670  16,887  1,371 18,325
22 University College London  2,776        3,162 1,624  35,035  2,878 38,550
23 University of Warwick 1,143        1,349    448
  8,098     873   9,334
24 University of York    986        1,180    386   4,841    696
TOTAL    33,829 39,546 18,166   426,414  33,191 477,103
Increase (%)    
  14.5%  82.7%    11.9%

Note: *  As described in the previous survey the numbers of members is obtained by entering the name of the institution in the search box.


Nos. of Researchgate publications

Nos. of items deposited in Researchgate in Aug 2012 (blue) & Feb 2013 (red)

Nos. of Researchgate Members

Nos. of Researchgate Members in Aug 2012 (blue) & Feb 2013 (red)

As illustrated in the accompanying diagrams it seems that the numbers of researchers who have signed up for a ResearchGate account has grown significantly over the past six months, and now stands at over 33,000 users, a growth of 82.7%. The numbers of papers which have been deposited by researchers at Russell Group universities has also grown to a total of over 477, 000 items. However since this represents a growth of 11.9% over six months it suggests that new members are providing metadata records only and not depositing the full text.

I therefore conclude that the conclusions I reached in my post which explained Why I’m Evaluating ResearchGate were correct and ResearchGate is a service which I should use not only to provide a presence about my research activities but also to host my research papers. I do wonder, though, whether the large numbers of items which have been deposited in ResearchGate is due to promotion of the service with the Russell Group universities or represents a bottom-up approach, in which researchers have recognised the benefits of the service and recommended it to their peers?

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in openness, Repositories | Tagged: , | Leave a Comment »

Why I’m Evaluating ResearchGate

Posted by Brian Kelly on 6 Feb 2013

A PDF Repository for my Research Publications

In a recent post which explained Why I’m Now Embedding ORCID Metadata in PDFs I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of the publications, ensure that authorship is asserted (by embedding the ORCID ID of the authors of the papers) and ensure that embedded images contain descriptions which help ensure that the content can be understood by visually impaired readers. In addition I wish to ensure that the PDF is stored in PDF/A format which provides a more preservable format.

In light of discussions on the blog and on email I have decided to embed the ORCID IDs for co-authors of my peer-reviewed papers although, as suggested by Geoffery Bilder, I will be embedding the HTTP URI version of the ORCID IDs (e.g. rather than just the ORCID ID itself (0000-0001-5875-8744). In addition I will also be embedding the DOI for papers which have been assigned a DOI.

But I am now faced with the problem of where the paper should be hosted. This post summarises the processes I am using in the selection of an appropriate repository service to complement my institutional repository.

Selection Processes

As described previously workflow processes used in the creation of cover sheets for items hosted in our repository means that metadata embedded in PDFs is lost. Although we’re having discussions with repository staff about this, it occurred to me that I now have an ideal opportunity to make use of a third-party repository service.

In the past I have normally deposited papers in my institutional repository and used third-party services (such as ResearchGate and to host the metadata, with links being provided to the full-text of the papers hosted in the institutional repository. The main reason for doing this was to ensure that usage statistics for accesses of the full-text was available in a single location rather than being fragmented across a range of services. There was a need to minimise the effort in collating such statistics for the product of evidence reports of our work which our funders have required in the past. However in light of the recent announcement of the cessation of core-funding for UKOLN, this is no longer a priority! Indeed it is now important to ensure that ideas described in peer-reviewed papers are widely disseminated.

Using ResearchGate

Having recognised the value of hosting PDF copies of my papers on a third-party repository service the question then was which one to select. The key criteria used in the selection were:

  • Easy to upload files.
  • Popular with readers.
  • Resource is easily found using Google.
  • PDF files preserved intact.
  • Service appears to be viable.

Researchgate: University of BathOn 25 December 2012 I received an automated email from ResearchGate which informed me that “28 of your colleagues from University of Bath have joined ResearchGate in the last month“. On 24 January 2013 an automated message announced “44 of your colleagues recently joined ResearchGate“. As illustrated the University of Bath”s entry of ResearchGate shows that there are currently researchers from 26 departments who have uploaded a total of 7,263 publications. It seems ResearchGate is growing in popularity, at least at the University of Bath.

On 20 December 2012 I was notified of the numbers of views of my papers (or, more accurately, the numbers of views of the metadata for my papers): “Your published research was viewed 1,678 times in 2012” so perhaps ResearchGate is popular beyond the University of Bath!

In light of the apparent popularity of the service I decided to upload one of my papers to the service: the PDF copy of the paper on “Developing A Holistic Approach For E-Learning Accessibility“.

It was trivial to upload the paper, especially as the associated metadata had been created previously. I then downloaded the PDF and was able to confirm that the metadata was still embedded in the PDF resource.

The paper can be accessed from ResearchGate and the user interface is shown below. I’ll leave others to judge the usability of the service.

ResearchGate page for CJTL 2004 paper

Page on ResearchGate for one of my papers

But in addition to users who are linked directly to the paper or access resources on the ResearchGate service using the Web site’s browse and search functionality, what of the discoverability of resources using Google.

ResearchGate, Google and Embedded Metadata

The PDF version of the paper now contains content which will not be widely used elsewhere: a combination of the authors’ names and their ORCID ID. A Google search for “Brian Kelly ORCID: 0000-0001-5875-8744“, “Lawrie Phipps ORCID: 0000-0002-0834-273X” or Elaine Swift ORCID: 0000-0002-6101-6861” should initially find information about the paper hosted on the UKOLN Web site, the UK Web Focus blog and other services which may be used by the co-authors, although not the institutional repository as this does not currently provide ORCID information (understandably, as ORCID is so new).

I have therefore provided links to the following Google searches which I will monitor to see when Google has indexed the PDFs hosted on ResearchGate:

Search Term Findings Date
Brian Kelly ORCID: 0000-0001-5875-8744 Large number of hits from UK Web Focus blog
together with ORCID, UKOLN and Slideshare Web sites
27 Jan 2013
Lawrie Phipps ORCID: 0000-0002-0834-273X 5 hits (ORCID and UKOLN Web sites and UK Web Focus blog) 6 Feb 2013
4 hits (ORCID Web site and UK Web Focus blog) 27 Jan 2013
Elaine Swift ORCID: 0000-0002-6101-6861 3 hits (ORCID and UKOLN Web site and UK Web Focus blog) 6 Feb 2013
2 hits (ORCID Web site and UK Web Focus blog) 27 Jan 2013

It appears that over a period of a week the ORCID metadata is being found from citation records hosted on the UKOLN Web site together with the citation records already indexed on the ORCID Web site and this blog, but not yet the PDF files hosted on ResearchGate. Might this be due to Google not indexing the site? In order to answer this question Google was used to provide information on the total number of resources on the service and the total number of PDF files. The results are given below.

Purpose Search Term Nos. of results Date
Total number of resources on site 24,100,000 –
55,300,000 *
6 Feb 2013
Total number of PDF files on site filetype:pdf 2,980,000 6 Feb 2013

* The numbers of search results have fluctuated from 24,100,000 – 55,300,000 during the last few days.

It seems that a large number of PDF files hosted on Researchgate have been indexed by Google, but it takes longer than a week for new resources to be indexed and the results found using a Google search.

Sustainability of the Service

Numbers of ResearchGate usersWhat Does The Evidence Say?

The home page for the service displays a graphic (to users who are not logged in) of the numbers of the service. It seems that 2.4 million users have subscribed. Since there are likely to be researchers, this does appear to be a significant number.

But what else do we know about the service and the company which provides the service? TechCrunch provides a handful of posts about the company together with the following summary:

ResearchGate is the leading social network for scientists. It offers tools and applications for researchers to interact and collaborate. ResearchGate offers a social, crowdsourced platform designed for researchers. The platform provides a global scientific web-based environment in which scientists can interact, exchange knowledge and collaborate with researchers of different fields.

The results of ResearchGate’s new search engine, called ReFind, are not merely based on keywords, but selected in an intelligent way based on semantic, contextual correlations.

Researchgate: numbers of users in 2012In addition the article also provides a graph showing the numbers of users over the past year, based on figures provided by Compete.

As can be seen, the numbers of unique visitors seem to be growing significantly, from 61,640K in December 2011 to 236,170K in December 2012.

MajesticSEO figures for ResearchgateI also used MajesticSEO to report on the SEO characteristics of the service (note free subscription required in order to view findings). As can be seen there are 7,459 domains which have links to and a total of 177,945 backlinks. Although such figures need to be regarded with caution (for example, they can be skewed significantly by link spam) the number of links from educational domains (3,241) and the numbers of educational domains (551) may be more appropriate to measure, due to the difficulties in creating educations domains to host link farms. This snapshot may therefore provide a useful baseline for measuring changes in the link popularity in the service.

Terms and Conditions

It should be noted that looking at the ResearchGate terms and conditions I found no suggestions that the company claims rights to sell my data or my attention data to others (although I haven’t studied the terms and conditions in great detail). Although some may welcome this, others may wonder what the business model for the company is. An article entitled ResearchGate Wants To Be Facebook For Scientists published by Forbes in March 2012 described how:

ResearchGate will also be looking into ways to monetize its platform. The “no-brainer” way to do that, in Madisch’s words, is to provide job boards for scientists looking for jobs. Universities and companies would pay the site to place listings. The company is also looking for ways to partner with other companies that manufacture and sell biotech lab equipment, as well as several other different programs.”

 Perhaps this is an appropriate business model which will accepted by researchers who normally shy away from free services on the grounds that “If You’re Not Paying for It; You’re the Product“.

Interest in UK HE Sector

Although ResearchGate seems to be growing in popularity globally (and in the University of Bath) is there any evidence of interest with the UK’s higher education community? For me this is not necessarily a significant issue (it can be fine to be an early adopter) but it would be interesting to see what others in my community are saying about the service.

Using a Google search for “researchgate terms and conditions I found that the DCC have provided a summary of ResearchGate in its list of resources of digital curators with a similar resource being provided by the University of Edinburgh’s College of Humanities and Social Science. A Google search for “researchgate UK finds a number of additional resources from the sector including pages provided by the University of Leeds (PDF format), the University of Leicester, the University of Liverpool (PDF format) and the University of Gloucester together with blog posts at the University of Loughborough and the University of Warwick.

My Decision

In light of these figures and my experiences in using the service I am happy to use the service to provide additional exposure to my research papers which complements the master copy of papers which are hosted on my institutional repository. Are other researchers making similar decisions or are alternative services felt to provide better options?

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in openness, Repositories | 19 Comments »

Why I’m Now Embedding ORCID Metadata in PDFs

Posted by Brian Kelly on 28 Jan 2013

“Every PDF needs a title”

The day after announcing a post on Reflections on the Discussion on the Quality of Embedded Metadata in PDFs I received a tweet from @community which alerted me to a blog post on SEO Action for PDF files on the Adobe blog. The post describes an extension for use in Acrobat X Pro which automates the settings of the properties of the PDF file in accordance with guidelines which can enhance the discoverability of PDF files by Google. The guidelines, which had been published way back in August 2009, were based on experiments which demonstrated improvements in Google’s indexing of PDF files. The article’s main conclusion was that “Every PDF needs a title“:

In terms of PDF files, the blue underlined text in Google’s search results comes from one of two places. First, Google looks in the “Title” document information field. If it finds nothing, Google’s indexer tries to guess the document’s title by scanning the text on the first few pages. This usually doesn’t work, producing incorrect and improperly formatted results.

In addition to this advice, the article also suggested use of other metadata fields including author, subjects and keywords.

Metadata For Peer-Reviewed Papers

Although I ensure that I provide the correct title for my peer-reviewed papers when I create them in MS Word I was unsure whether I included the names of the co-authors or made use of other metadata fields.

Metadata fields in MS WordOn Friday 25 January 2013 I decided to update the metadata for one of my papers, “Developing A Holistic Approach For E-Learning Accessibility” which was the first paper myself, Lawrie Phipps and Elaine Swift wrote back in 2004

I added a number of tags to the paper and used the Comments field to provide the abstract. In addition the publication details were added to the Status field.

Whilst updating the metadata it occurred to me that it would be useful to include the ORCID ID for the authors as this will be less volatile than the author’s email address (one of the co-authors was based at the University of Bath when the paper was published but subsequently moved to Nottingham Trent University).

alt text for images in MS WordIn addition to the resource discovery metadata for the paper I also remembered that I should ensure that images in the paper contained appropriate alt text so that image descriptions are available to those who may make use of a screen reader. Fortunately we had done this for the paper, but I have to admit that this isn’t necessarily done for all of my research papers.

Having updated the metadata for the paper and embedded images I then created the PDF from MS Word. I noticed that the Save As PDF option in MS Word enabled a number of options to be specified, including Save As ISO-19005 (PDF/A).

As described in Wikipedia PDF/A is “an ISO-standardized version of the Portable Document Format (PDF) specialized for the digital preservation of electronic documents“. The articles goes on to explain that “PDF/A differs from PDF by omitting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding)“.

Savie as PDF option in MS WordSince the digital preservation of peer-reviewed publications is important I ensured that I saved the paper in PDF/A format, using the Save As option illustrated.

Approaches to Embedded Metadata Embedded in PDFs

What practices should be used in providing the metadata to be created in the original authoring tool (MS Word, in my case) which will then be available in the PDF version of the paper? Here’s a summary of the approaches I have used:

Title: The title of the paper

Tags: My preferred tags about the content and my organisation.

Comments: The abstract of the paper, normally taken from the abstract provided in the paper.

Author: First Name Surname (ORCID: ORCID ID) e.g. Brian Kelly (ORCID: 0000-0001-5875-8744)

The title field will be obvious. The tags will reflect keywords which I feel will enhance access to the document (and I choose less than five). I am using the comments field to host the abstract for the paper. Finally the author field contains the full name followed by ORCID: ORCID ID (in brackets). I feel that this is a pragmatic approach to ensuring that the significant information which will be indexed by Google is found in the metadata fields which are available through my authoring tool (MS Word).

But could this cause problems? Might Google think my name is Mr Orcid or Mr 0000-0001-5875-8744? Might other indexing and aggregation tools have problems as I am misusing the semantics of these metadata tools? My feeling is that Google will be capable of understanding the content and it is better to have such quality metadata (which I have chosen) rather than no metadata. But are other researchers embedding ORCID IDs in PDFs? In order to answer this question I have using Google’s advanced search capability to search for “ORCID” in PDF resources across a number of domains, as summarised for "ORCID" in PDFs in domain

Domain Results Date Current Results
All 3,840 28 Jan 2013 Try it   109 28 Jan 2013 Try it       0 28 Jan 2013 Try it

These numbers are low – and when you realise that the results include PDFs which contain the string “ORCID” in the text of the pages (as illustrated) it seems clear that there is little evidence that ORCID IDs are being embedded in PDFs yet.

So before I embed ORCID IDs in my other papers I would welcome feedback on this proposal. Is it desirable to include the ORCID IDs of authors in the PDF versions of papers? If so, is the approach I have taken to be recommended to others? Or might it be desirable to provided richer structured metadata in PDF files, using the XMP (Extensible Metadata Platform) standard? But if this is felt to be desirable, how would it fit into the workflow, given that it appears difficult to persuade authors to provide metadata for their papers in any case?

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetRearch] – []

Posted in Identifiers, Repositories | Tagged: | 6 Comments »

A Tribute to Aaron Swartz: Lets Make #pdftribute Trend

Posted by Brian Kelly on 13 Jan 2013

I’m sure many readers of this blog will have heard the news of the untimely death of Aaron Swartz. As described on the BBC News Web site:

Aaron Swartz, a celebrated internet freedom activist and early developer of the website Reddit, has died at 26.

The activist and programmer took his life in his New York apartment, a relative and the state medical examiner said. His body was found on Friday.

A sad day, especially for those who share Aaron Swartz’s commitment to openness and admire his commitment to the development of tools, services and standards, such as RSS, which have helped to make open access to resources accessible on a global basis.

Earlier today I came across a tweet which encouraged academics to show their support for Aaron’s work:

Please share: Academics posting their papers online in tribute to Aaron Swartz using hashtag #pdftribute.

Storify summary of #pdftribute tweetsI would like to endorse this proposal. I have created a Storify summary of the #pdftribute tweets, which contains over 500 posts since the call was made just over 3 hours ago.

Although we have see that initial tweet being widely retweets, as @neuroconscience (Micah Allen) has suggested:

Folks as exciting as #pdftribute is we need less links talking about it and more actual paper posting.

But what could be said in 140 characters?

Within my Twitter stream I have already seen tweets from those involved in supporting their institutional repository including @SarahNicholas:

Cardiff academics! Post your articles to @CardiffOrca#openaccess#pdftribute

and @glamlaflib (Sue House):

Glamorgan academics can deposit their articles & papers here (if you retained the copyright) #pdftribute

I have also seen @openscience endorsing @jambina’s reminder of the role which can be played by librarians:

Librarians: always friends in #openaccess#openscience MT @jambina: librarians can help you free your work. we are on your side #pdftribute

Meanwhile @MrGunn describes services which can be used:

@opendna @venturejessica @Aine Mendeley can push into to local repository via Symplectics Elements, other routes can be made with Open API.

Of course many researchers are demonstrating their commitment to providing open access to their research papers:

Others, such as @mlterpstra (ML Terpstra) make the case for open data policies:

#public funded #academia should have a #opendata policy for their scientific papers #Aaron #pdftribute. Lets call it #AaronsLaw?@birgittaj

whilst others provide a more political view:

@MarietjeD66 @mikebutcher Let this be the start of the end of the ridiculous copyright laws. #pdftribute #AaronSwarz

Would you like to join in by giving your views or ensuring that your Twitter community is aware of how you have made your research papers openly available?

Note archives of the #pdftribute tweets are available at and

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – []

Posted in openness, Repositories | Tagged: | 2 Comments »

Reflections on the Discussion on the Quality of Embedded Metadata in PDFs

Posted by Brian Kelly on 11 Jan 2013

The Quality of Metadata Embedded in PDFs

Embedded metadata in PDFsThe recent post on Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out & Outside-In View generated a fair amount of discussion, with ~17 comments on the post itself but perhaps more significantly, a more interactive discussion on Twitter, with relevant contributions being made by @mrnick, @neilstewart, @rmounce, @carusb@pj_webster, @emmatonkin, @MikeTaylor and @wrap_ed, with other Twitter users sharing links to the posts to their communities.

Whilst some people may still feel that discussions should take place on one centralised system (e.g. a mailing list) in reality this is an unrealistic expectation. In the real world discussions based on ideas which may have originated online will be dispersed across office and common rooms in institutions around the world, to say nothing of other discussions which may take place in pubs and coffee rooms as well as whilst travelling. Conversations about interesting ideas will be distributed; we have to accept that. However it can be helpful to aggregate valuable comments which may be fragmented across a variety of communication channels. Since I felt that the Twitter discussions about the post were particularly interesting I have created a Storify summary entitled The Quality of Embedded Metadata in PDFs (Jan 2013). Note that this complements the Topsy summary which gives the tweets which contains links to the blog post.

Note that in the comments on the blog post Nick Sheppard suggested that a forthcoming UK RepNet event might provide an opportunity to discuss the issues which were raised in more depth::

I wonder if some of these issues might be relevant within the context of the UK RepNet project which is holding a meeting in London on 21st Jan –

I will therefore provide a summary of the main issues which were discussed on the blog and on Twitter.

The Context

The initial post was written in response to a post by Ross Mounce in which he asked PDF metadata – why so poor? and a follow-up post a week late on PDF metadata: different tool, same story. Ross’s post was based on an analysis of the metadata embedded in PDFs hosted by scholarly publishers. Ross’s second post succinctly summarised his work:

So a week ago, I investigated publisher-produced Version of Record PDFs with pdfinfo and the results were very disappointing. Lots of missing metadata was found and one could not reliably identify most of these PDFs from metadata alone, let alone extract particular fields of interest.

I wondered whether PDFs hosted in institutional repositories also suffered from poor quality or missing embedded metadata. I examined some papers I had deposited in the University of Bath repository and found that metadata which was contained in the original PDF file I uploaded to the repository was missing from the PDF which users can download. I surmised that the metadata had been lost in the workflow when a cover sheet was added to the paper.

My post referenced a post by Lorcan Dempsey entitled Discovery vs discoverability … in which he explored the idea of the “inside-out and outside-in library“. This seemed very relevant to this scenario as both Ross and myself were concerned primarily by the implications is missing metadata for systems which may be used outside of the repository context: in Ross’s case this related to text mining of large collections of PDFs whereas my interest focussed on reuse of PDFs in other repositories.

The Discussion

Embedded metadata in PDFsThe initial comment on the blog post by Ingmar Koch illustrated how embedded PDF metadata can be (mis-)used by external systems. Ingmar descried how “the company that designed the document templates for most of the government agencies added a title and author in the template-file. The result is that thousands of online government documents (.pdf and .doc) are titled “at opinio facillime sumitur” and are written bij M. Hes.” This example provides a vivid illustration of how metadata embedded in PDFs is being used by Google. However this example might also be used to demonstrate the poor quality of embedded metadata.

In light of such examples Neil Stewart therefore askeddoes it matter if the rare and patchy instances of author-created metadata gets over-written or otherwise distorted?” since “the structured metadata provided at Eprint/DSpace/other repository software record level does the job here (as opposed to metadata embedded within the PDF itself).

But surely we cannot argue that since some resources may contain poor quality metadata we should delete all metadata! I would argue that there is a need to educate authors on the importance of appropriate metadata, which includes showing how such metadata can be used by services outside of the host institution. Neil recognises the validity of this point when he acknowledged that “not every service will use OAI-PMH or web crawling, some might parse the objects themselves“.

The discussion then moved on Twitter and initially addressed the relevance of cover sheets, since these appear to cause problems in work flows which take place outside of the institutional repository.

Ross Mounce asked:

why do IRs need 2 slap on cover page anyway? Perhaps they should just embed additional provenance metadata @briankelly @mrnick @neilstewart

Neil Stewart provided one use case for cover sheets:

@rmounce @briankelly @mrnick viewed as a way of advertising provenance (proper citation), branding as from home inst but agreed!

However Ross re-iterated his criticisms of cover sheets:

Cover-pages from a user-POV r just plain annoying. If provenance must be visibly embedded why not overlay? @neilstewart @briankelly @mrnick

Others, such as Chris Rusbridge, agreed with this view:

@mrnick @ukcorr @rmounce @briankelly @stevehit I agree with Ross that it’s BAD practice, from my POV

The discussion then moved on to problems which may occur if a paper is to be downloaded, with Nick Sheppard provided a good example of how PDFs may end up containing multiple cover sheets if they are taken from one repository and deposited (by, for example, a co-author) in another repository:

@neilstewart Um, can also lead to cover page disasters like this (scroll down)…@rmounce @briankelly

I then highlighted a paper by my colleague Emma Tonkin which showed that that problems with poor quality metadata went beyond the individual examples provided on Twitter:

@carusb @mrnick @rmounce My colleague @emmatonkin analysed PDF metadata a few years ago:

The paper (PDF format) described how:

Many repositories … have developed or identified a means of adding a cover sheet to each document within the repository. This has potential for positive impact, for example, as a means of clearly indicating the provenance of an item (Puplett, 2008). As can be seen in Fig. 7, Google Scholar does not necessarily recognise the cover sheet for what it is, and this has negative implications for effective indexing and retrieval.

and went on to conclude:

However, the addition of a cover sheet has caused a number of issues beyond those that are usually encountered with the PDF format (ie. font problems, file corruption, etc). This limits the ability for automated processes to make use of this information, and could therefore be said on the level of automated indexing and other software access (such as conversion) to be a retrograde step. If this becomes common practice it may be necessary to review both the assumptions under which automated systems are developed, and perhaps the rationale that lead us to make use of cover sheets in this context.


The paper on Supporting PDF accessibility evaluation: early results from the FixRep project was written in 2010 by my colleagues Emma Tonkin and Andy Hewitt and presented at the 2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010).The concluding sentence in the paper highlighted work which needs to be addressed:

it may be necessary to review both the assumptions under which automated systems are developed, and perhaps the rationale that lead us to make use of cover sheets in this context

The paper identified the benefits of cover sheets but also the problems they can cause for automated activities which may take place outside of the institutional repository environment.

But should repository managers and developers necessarily devote resources to addressing potential problems which may arise downstream of the repository environment? In a comment on Ross Mounce’s blog the point was made that publishers will need there to be a sound business case to be made:

“Why would publishers add metadata? Because their customers – libraries, governments, research funders (in the case of Open Access PDFs ) should demand it.” I’m not seeing a compelling business case here. High-quality metadata would be nice, but can anybody argue that their research is being hampered by a lack of such metadata? Could someone working in publishing make a case to their boss that adding such metadata would generate more revenue, web traffic, manuscript submissions (insert whatever metric matters)?

In the context of institutional repositories perhaps the approach to be taken would be to ensure that embedded metadata is preserved and that the training and advice provided by repository support staff ensures that authors are made aware of the ways in why embedded metadata can be used, even if such reuse takes place outside of the institutional repository.

The discussion also highlighted the need for enhanced workflow practices for merging cover pages with the original content and also for enabling users (and automated tools) to be able to access the original source paper in addition to the version contained provenance information designed for consumption by users.

Do any institutional repositories currently provide solutions to these requirements? In addition, I am interested in how many institutional repositories provide cover pages and whether those that do use a repository plugin technology to do this, some other automated technologies or by manual processes. Two polls on these questions are embedded in this post but if the situation is more complex than can be summarised in the poll, feel free to add a comment.

Footnote (added 12 January 2012): A tweet from @community alerted me to the blog post on SEO Action for PDF files on the Adobe blog. This describes an “Action” for use in Acrobat X Pro that will automate setting the properties of the PDF file in accordance with guidelines which can enhance the discoverability of PDF files by Google.

View Twitter conversation from: [Topsy]  |  View Twitter statistics from: [TweetReach] – []

Posted in Repositories | 5 Comments »

Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out & Outside-In View

Posted by Brian Kelly on 4 Jan 2013

PDF Metadata – Why Is it So Poor?

Metadata in PDF sourcePDF metadata – why so poor? asked Ross Mounce in a blog post published on New Year’s eve.

In the post Ross expressed surprise that although “with published MP3 files of audio you get rather good metadata … the results from a little preliminary survey of academic publisher PDF metadata” were poor: “Out of the 70 PDFs I’ve published (meta)data on over at Figshare, only 8 of them had Keywords metadata embedded in them“.

This made we wonder about the quality of the metadata for papers I have uploaded to Opus, the University of Bath repository.

I looked at a paper on A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First which is available in Opus in PDF and MS Word formats.

I first used Adobe Acrobat in order to display the metadata for the original source PDF file, prior to uploading to the repository. As can be seen from the accompanying screen shot the metadata included the title, the author details (with the email address for one of the authors) and two keywords.

Metadata for repository copy of paperHowever looking at the display for the PDF downloaded form the repository we find that no metadata is available!

This PDF differs from the original source in that a cover page is added dynamically by the repository in order to provide appropriate institutional branding. It would appear that in the creation of the new PDF resource, the original metadata is lost.

Metadata for MS Word masterLooking at the metadata created in the original source document – an MS Word file – we can see how the authors’ names which were subsequently concatenated into a single field. We can also see that although the title of the paper was given correctly, poor keywords had been included, which did not reflect the keywords which were included in the paper itself (Web accessibility, disabled people, policy, user experience, social inclusion, guidelines, development lifecycle, procurement).

I suspect that I am not alone in not spending much time in ensuring that appropriate metadata is embedded in the master source of a peer-reviewed paper. I have also previously not considered how such metadata might be lost in the workflow processes when uploading to an institutional repository: after all, surely the important metadata is added when the paper is deposited into the repository?

Ross’s blog post made me check the embedded metadata – and I discovered that the correct metadata is still included in the MS Word file which was uploaded to the repository along with the PDF copy.

Does the loss of the metadata embedded in the PDF matter? After all, surely people will use the search facilities provided in the repository in order to find papers of interest?

But people will not necessarily visit a repository to find papers of interest. A post which described A Survey of Use of Researcher Profiling Services Across the 24 Russell Group Universities showed that on 1 August 2012 there were over 18,000 users of ResearchGate in the 24 Russell Group universities and judging by the messages along the lines of “28 of your colleagues from University of Bath have joined ResearchGate in the last month. Why not follow them today?” which I am currently receiving, use of this service is growing.

researchgate-papers-abstractAs can be seen from the screenshot of my ResearchGate profile, the service provides access to PDF copies of my papers. I normally simply provide a link to the PDF hosted in the repository but the example illustrated contains a copy of original PDF which was uploaded to the service by one of the co-authors.

In the case of most of my papers it is clear from the thumbnail of the PDF that the paper contains the coversheet provided by the repository.

Researchgate Paper (hosted in Opus)


We can see that the PDF copy of a paper hosted in a repository should not be regarded as a final destination; rather the PDF may be surfaced in other environments.

It will therefore be important to ensure that workflow processes do not degrade the quality of the PDF. It will also be important to ensure that authors are made aware of how embedded metadata may be used by services beyond the institutional repository. But to what extend do repository managers feel they have a responsibility to advise on practices which will enhance the discoverability of content on services hosted outside the institution?

Taylor FrancisIn a paper which asked “Can LinkedIn and Enhance Access to Open Repositories?” myself and Jenny Delasalle commented on how “commercial publishers are encouraging authors to use social media to drive traffic to papers hosted on publishers’ web sites” and provided examples of such approaches from Taylor and Francis, Springer, Sage and Oxford Journals. As an example, Taylor and Francis describe how they are “committed to promoting and increasing the visibility of your article and would like to work with you to promote your paper to potential readers” and go on to document services which can help achieve this goal.

In a blog post which discussed the ideas describe din the paper I described how we had failed to find significant evidence of similar approaches being employed by repository managers:

It was interesting that in Jenny’s research she found that a number of commercial publishers encourage their authors to use services such as LinkedIn and to link to their papers hosted behind the publishers paywalls – and yet we are not seeing institutional views of the benefits of coordinated use of such services by their researchers. Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers’ work in institutional repositories.

But that paper was limited to use of third-party services to provide access routes to research papers. What of the bigger picture in which institutional work flow processes should be designed to enhance discoverability?

The ‘inside-out and outside-in library’

On Wednesday in a post entitled Discovery vs discoverability … Lorcan Dempsey explored the idea of the “inside-out and outside-in library“. In the post Lorcan described how:

Throughout much of their existence, libraries have managed an outside-in range of resources: they have acquired books, journals, databases, and other materials from external sources and provided discovery systems for their local constituency over what they own or license.

However in a digital and network world, there have been two major changes, which shift the focus towards inside-out:

First access and discovery have now scaled to the level of the network: they are web scale. If I want to know if a particular book exists I may look in Google Book Search or in Amazon, or in a social reading site, in a library aggregation like Worldcat, and so on. … Secondly the institution is also a producer of a range of information resources: digitized images or special collections, learning and research materials, research data, administrative records (website, prospectuses, etc.), faculty expertise and profile data, and so on.

Lorcan goes on to describe the challenge facing libraries:

How effectively to disclose this material is of growing interest across libraries or across the institutions of which the library is a part. This presents an inside-out challenge, as here the library wants the material to be discovered by their own constituency but usually also by a general web population.

I would suggest that institutional repositories could usefully adopt the approach taken by Taylor and Francis:

 “[The institution is] committed to promoting and increasing the visibility of your article and would like to work with you to promote your paper to potential readers

But rather than simply encourage researchers to simply add links to papers deposited in the repository from popular services such as LinkedIn and ResearchGate might the institutional goal be enhanced by encouraging researchers to make the content of their papers available in such third party services (subject to copyright considerations) – with the institutional repository providing both a destination and a component in a workflow, with papers being surfaced in services such as ResearchGate, as I have illustrated above.

If such an approach were to be embraced there would be a need to ensure that embedded metadata was not corrupted through repository workflow processes. If, however, the repository is regarded as the sole access point, there would be little motivation to address such limitations in the work flow.

Or to put it another way, repository managers will have a need to manage content hosted within the institution, including management to support the use of the content by services they have no control over.

To a certain extent, this has already been accepted: repositories were designed to have “cool URIs” which can help resources to be discovered by Google. I am suggesting that there is a need to observe usage patterns which indicate emerging ways in which users are finding content. The growing numbers of email alerts from ResearchGate suggest that it may be a service to monitor – with Ross Mounce’s recent post of on the quality of metadata embedded in PDFs suggesting one area in which there will be a need to revisit existing workflow processes.

PS. Ross Mounce described “a little preliminary survey of academic publisher PDF metadata” and has published the data on Figshare. Has anyone harvested the metadata embedded in PDFs hosted on repositories and published the findings?

View Twitter conversation from: [Topsy]

Posted in Repositories, Web2.0 | 21 Comments »

Reflections on the “Top 10 Tips on How to Make Your Open Access Research Visible Online”

Posted by Brian Kelly on 13 Dec 2012

Top 10 Tips on How to Make Your Open Access Research Visible Online

Open Access Yesterday I received an email which informed me that contribution to the Jisc Inform online newsletter (issue 35, December 2012) had been published. The article on Top 10 Tips on How to Make Your Open Access Research Visible Online is based on a blog post originally published on the Networked Researcher blog which was tweaked slightly and republished on the Jisc blog. The version published in the Jisc Inform newsletter includes a series of images to accompany each of the ten tips.

The tips were originally developed to accompany a series of presentations given at the universities of Exeter, Salford and Bath during Open Access Week. These presentations were based on the experiences gained in use of social media to help maximise access to peer-reviewed publications. In particular the tips documented the experiences of use of social media services such as blogs, Twitter and Slideshare to help maximise the readership of a paper entitled “A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First“.

The Complexities Behind the Tips

It is interesting to see how the advice initially given in a one-hour seminar can be distilled into a series on top tips. The sceptic may be dismissive of the value of reducing the complexities of open practices for researchers to a series of top tips. However at the recent SpotOn 2012 conference in sessions such as How to do Smart Journalism on Complex Science the value of science writers in being able to communicate complex scientific ideas in ways which can be understood by the general public was emphasised. The challenges, however, was to ensure that those with a deeper interest in the complexities can be able to access resources which provide more in-depth discussions.

Sldie on Slideshare statisticsIn the case of the Top 10 Tips on How to Make Your Open Access Research Visible Online more detailed information was provided in the slides of the original talk. In addition, as illustrated the slides also contain links to further information. In the example shown evidence that being proactive in ensuring that the co-authors of the paper provided links to the presentation on their blog posts and Twitter channels can be seen from the large numbers of views of the slides during the week of the conference.

The limitations of Slideshare statistics was mentioned, but the slide also contain a link to the usage statistics which showed how the accompanying paper was, at the time, the most downloaded of my peer-reviewed papers which had been deposited in the University of Bath repository this year.

In addition to the more detailed information provided in the slides during the presentation itself I expanded on a number of issues, including responding to questions raised during the talk. A post has been published on the JISC-funded Open Exeter blog about the Open Access Week @ Exeter which includes a series of videos of the invited presentations. The video of my talk is available on YouTube and embedded below. I hope this additional information complements the top 10 tips published in Jisc Inform.

View Twitter conversation from: [Topsy]

Posted in openness, Repositories | 6 Comments »

Guest Post: Reflections on Open Access Week 2012 at the University of Oxford

Posted by Brian Kelly on 4 Dec 2012

During Open Access Week a series of guest blog posts were published on this blog in which three repository managers shared their findings of SEO analyses of their institutional repositories.

As a follow-up to those posts, which were motivated by a commitment to openness and sharing which is prevalent in the repository community, this post by Catherine Dockerty (Web and Data Services Manager, Radcliffe Science Library) and Juliet Ralph (Bodleian Libraries Life Sciences Librarian) provides a summary of the activities behind the Open Access Week event at the University of Oxford.

Open Access Week at Oxford

Open Access Week 2012 saw a determined effort from the Bodleian Libraries of Oxford University to shine a light on developments in Open Access with a full week-long programme of events. This was prompted by the need to assess the state of play in Open Access (OA) which, for major research institutions such as Oxford, is particularly urgent in the wake of the publication of the Finch Report. It was the second year we have participated in Open Access Week – last year we held a single event and we wanted to do a lot more this time round.

What We Were Trying To Do

We had a number of specific things we wanted to achieve though our programme:

  • Increasing the knowledge of library staff. All reader-facing staff will potentially deal with enquiries relating to Open Access.
  • Assembling and showcasing the expertise of Bodleian Libraries staff in Open Access. Readers need to know what we can do for them.
  • Raising awareness of publishing options to academic researchers.
  • Promoting submission to Oxford’s institutional repository ORA (Oxford Research Archive). Oxford currently has mandatory deposit for doctoral theses, but not for research papers.
  • Highlighting Oxford’s progress in the field of Open Data.

What We Did

We put together a programme of talks and other activities, most of which were lunchtime sessions and took place at the Radcliffe Science Library, one of the Bodleian Libraries and Oxford University’s main library for the sciences and engineering. The majority of speakers were library staff. The focus was on science, but events covering law and medicine were included and there were attendees from the humanities and social sciences.

An evening session, “Bodley’s ‘Republic of [Open] Letters” was hosted by the Oxford Open Science Group and highlighted the DaMaRO Project, which is developing a research data management policy and data archiving infrastructure for Oxford

The presentations are available online.

Wikipedia Editathon

Ada Lovelace by Margaret Carpenter, 1836

Ada Lovelace by Margaret Carpenter, 1836

The final event of the Open Access Week programme was a Wikipedia “Editathon” on the theme Women in Science. The event was organised as a collaboration between the Bodleian Libraries and Oxford University’s IT Services, and was a follow-up to the Ada Lovelace Day event at the Royal Society the week earlier. This tied in neatly with Open Access Week as we were able to highlight open access sources for use in updating articles. Our event was publicised at the Royal Society one and on Ada Lovelace Day Wikipedia page.

Having an Oxford-based Wikipedia event was also an opportunity to encourage academics and students to get involved in editing Wikipedia, which is reliant on expert contributors to add high quality articles and improve existing ones. Wikipedia has a readership vastly exceeding that of any academic journal, and presents an opportunity for academics to have an impact on a wider audience.

Juliet Ralph (Bodleian Libraries Life Sciences Librarian) kicked off the proceedings with an introductory talk to introduce Wikipedia and outline the format of the session. Online resources for editing articles were suggested, focusing on open access. The fact that the Royal Society was providing free access to all its publications until 29th November 2012 was highlighted. A collection of printed reference materials from the RSL’s collection was also provided.

A list of articles for adding/updating was provided as guidance to participants, but this was not intended to be prescriptive. The list was the same one as used at the Royal Society event, updated to reflect all the work done that day.

We were very pleased that Oxford-based Wikipedians James and Harry Burt were able to attend and assist the assembled editors. They also treated us to an impromptu presentation on their work as long-time Wikipedia editors.

Online participation via Twitter was encouraged using the hashtag #WomenSciWP (the same as for the Royal Society event). Note that a Twubs archive of the tweets is available. The event was also live-tweeted from the RSL’s Twitter feed (@radcliffescilib).

By the end of the session two new articles were created and 12 updated. Attendees were mainly research staff and postgraduate students from the fields of science and medicine. Also present were two archivists from the Saving Oxford Medicine project who posted a blog post about the work.

Special thanks to:

  • James and Harry Burt for presenting and for help they gave to other participants.
  • Izzie McMann and Karen Langdon (Radcliffe Science Library staff) for assisting participants on the day.
  • Janet McKnight (IT Services) and Alison Prince (Bodleian Libraries Web Manager) for help in organising and publicising the event.
  • Andrew Gray (British Library Wikipedian in Residence) and Daria Cybulska (Wikimedia UK) for publicising the Editathon and supplying learning materials for the session.


We certainly achieved the aim of increasing the knowledge of OA issues in Library staff within the sciences, several of whom attended more than one event. In future we will aim to actively promote the staff development benefits from participating to all Bodleian Libraries staff, not just those in the sciences. Our collaborations with the Open Science Group and IT Services were successful, and we hope to work together with them on future events.

We fulfilled all our original intentions to some extent, but some events were not well attended in spite of being publicised widely although were positively received by those who did.

The timing of Open Access Week is a problem for Oxford as the start of the academic year is later than for most UK universities, which means the new term is just getting underway in earnest and there are many other events to compete with. Staff time in planning events is also in short supply as reader-facing staff will have been prioritising inductions for new students over the previous weeks.

The Wikipedia event was a success (well attended with positive feedback) and we would certainly hold a similar event in the future, although not necessarily as part of Open Access Week. The fact that it was a hands-on session went down well, and the Women in Science theme attracted interest.

Next Time

Holding events at lunchtime was evidently not popular and we may decide to move them to an afternoon slot (colleagues who run user education programmes had a higher take-up when they did this). We may also move the sessions out of the library into academic departments or colleges, and hold events at other times of year.

We will be making a concerted effort to involve well-known speakers, rather than relying heavily on library staff.

We will be looking to encourage other OA events in Oxford and elsewhere, and we will also think about using online chat as well as Twitter for online participation. The planning starts now!

View Twitter conversation from: [Topsy]

Catherine DockertyCatherine Dockerty is the Web and Data Services Manager at the Radcliffe Science Library at Oxford University where her role is managing online content, social media and communications, and to support colleagues in serving the University’s teaching and research in the sciences. She has spent 13 years working in various reader services roles at Oxford University, and has also worked in the civil engineering industry and the book trade.

Juliet RalphJuliet Ralph is the Subject Librarian for Life Sciences and Medicine in the Bodleian Libraries at Oxford, where she has worked for over 15 years. She is one of many librarians involved in providing support for research at Oxford, including Open Access.

Posted in Guest-post, openness, Repositories | Tagged: , | 1 Comment »

Open Practices for Open Repositories

Posted by Brian Kelly on 29 Oct 2012


Open Access Week, which took place last week, was a busy period for me. Not only did I give talks on how social media can enhance access to research papers hosted in institutional repositories at the universities of Exeter, Salford and Bath, I also wrote accompanying posts which were published on the Networked Researcher and JISC blogs. But perhaps more importantly last week I coordinated the publication of three guest posts on this blog: SEO Analysis of WRAP, the Warwick University RepositorySEO Analysis of LSE Research Online and SEO Analysis of Enlighten, the University of Glasgow Institutional Repository.

Sharing of Repository Practices and Experiences

The background to this work were the two papers I co-authored for the Open Repositories OR 2012 conference. In the paper on “Open Metrics for Open Repositories” (available in PDF and MS Word formats) myself, Nick Sheppard, Jenny Delasalle, Mark Dewey, Owen Stephens,Gareth Johnson and Stephanie Taylor conclude with a call for repository managers, developers and policy makers to be pro-active in providing open access to metrics for open repositories. In the paper which asked “Can LinkedIn and Enhance Access to Open Repositories?“, also available in PDF and MS Word formats, Jenny Delasalle and myself described how popular social media services which are widely use by researchers can have a role to play in  enhancing the visibility of papers hosted in repositories. However although LinkedIn and appeared to be widely used we  concluded by described how “further work is planned to investigate whether such links are responsible for enhancing SEO rankings of resources hosted in institutional repositories“.

This work began with a post which described the findings of a MajesticSEO Analysis of Russell Group University Repositories. This post made use of the MajesticSEO service which can report on SEO ranking factors for Web sites. The survey provided initial findings of a survey of institutional repositories hosted by the 24 Russell Group universities.

This initial post was intended to explore the capabilities of the tool and gauge the level of interest in further work.  In response to the post the question was asked “Are [the findings] correlated with amount of content, amount of full-text (or other non-metadata-only) content, breadth or depth of subject matter, what?” These were valid questions and were addressed in the more detailed follow-up surveys, which were provided by repository managers at the universities of Warwick, Glasgow and LSE who have the contextual knowledge needed to provide answers to such questions.

In this initial series of guest blog posts, William Nixon concluded with the remarks:

This has been an interesting, challenging and thought-provoking exercise with the opportunity to look at the results and experiences of Warwick and the LSE who, like us reflect the use of Google Analytics to provide measures of traffic and usage.

The overall results from this work provide some interesting counterpoints and data to the results which we get from both Google Analytics and IRStats. These will need further analysis as we explore how Majestic SEO could be part of the repository altmetrics toolbox and how we can leverage its data to enhance access our research.

I feel the exercise has been valuable for the three contributors. But I also feel that the descriptions of the experiences in using the MajesticSEO tool, the findings and the interpretation of the findings in an open fashion will be of valuable to the wider repository community, who may also have an interest in gaining a better understanding of the ways in which repository resources are found by users of popular search engines, such as Google.  There will also be a need to have a better understanding of the tools used to carry out such analyses. How, for example, will SEO analysis tools address link farms and other ‘black hat’ SEO techniques which may provide significant volumes of links to resources which may, in reality, be ignored by Google?

William Nixon’s post concluded by pointing out the need for:

further analysis as we explore how Majestic SEO could be part of the repository altmetrics toolbox and how we can leverage its data to enhance access our research.

I suspect the University of Glasgow will not be alone in wishing to explore the potential of SEO analysis tools which can help in understanding current patterns of traffic to repositories and in shaping practices to enhance such traffic. I hope the work which has been described by Yvonne Budeden, Natalia Madjarevic and William Nixon has been useful to the repository community in summarising their initial experiences.

I should also add that Jenny Delasaale and I are giving a talk at the ILI 2012 conference which will ask “What Does The Evidence Tell Us About Institutional Repositories?” We are currently finalising the slides for the talk, which are available on Slideshare and embedded below. There is still an opportunity for us to update the slides, which might include a summary of plans for future work in this area. So we would very much welcome your feedback and suggestions. Perhaps you might be willing to publish a guest post on this blog which builds on last week’s posts?

Posted in openness, Repositories | Tagged: , | 5 Comments »

SEO Analysis of LSE Research Online

Posted by ukwebfocusguest on 24 Oct 2012


The second in the series of guest blog posts which gives a summary of an SEO analysis of a repository hosted at a Russell Group university is provided by Natalia Madjarevic, the LSE Research Online Manager. As described in the initial post, the aim of this work is to enable repository managers to openly share their experiences in use of MajesticSEO, a freely-available SEO analysis tool to analyse their institutional repositories.

SEO Analysis of LSE Research Online

This post takes an in-depth look at a search engine optimisation (SEO) analysis of LSE Research Online, the institutional repository of LSE research outputs. This builds on Brian Kelly’s post published on this blog in August 2012 on MajesticSEO Analysis of Russell Group University Repositories.

The London School of Economics and Political Science


LSE is a specialist university with an international intake and a global reach. Its research and teaching span the full breadth of the social sciences, from economics, politics and law to sociology, anthropology, accounting and finance. Founded in 1895 by Beatrice and Sidney Webb, the School has a reputation for academic excellence. The School has around 9,300 full time students from 145 countries and a staff of just under 3,000, with about 45 per cent drawn from countries outside the UK. In 2008, the RAE found that LSE has the highest percentage of world-leading research of any university in the country, topping or coming close to the top of a number of rankings of research excellence. LSE came top nationally by grade point average in Economics, Law, Social Policy and European Studies and 68% of the submitted research outputs were ranked 3* or 4*.

LSE Research Online – a short history

LSE Research Online (LSERO) was set up in 2005 as part of the SHERPA-LEAP project. The aim of the project was to create EPrints repositories for each of the seven partner institutions, of which LSE was one, and to populate those repositories with full-text research papers. In June 2008 the LSE Academic Board agreed that records for all LSE research outputs would be entered into LSE Research Online. We have no full-text mandate but authors are encouraged to provide full-text deposits of journal articles in pre-publication form, clearly labelled as such, alongside references to publications. Research outputs included in LSE Research Online appear in LSE Experts profiles automatically, thereby reusing data collected by LSE Research Online.

LSE Research Online is to be the main source of bibliographic information for the Research Excellence Framework (REF) in 2014. This has served to further increase the impetus for deposit and visibility of the repository in the School and we have various repository champions throughout the School across departments.

LSE Research Online size and a brief look at usage statistics

As of September 2012, LSE Research Online contains around 33,696 records, with 7,050 full-text items. We include a variety of item types such as articles, book chapters, working papers, data sets, blogs and conference proceedings. We most recently began collecting LSE blogs to create a permanent home for this important content. We began tracking LSERO site usage with Google Analytics in 2007 and the site has received 2,268,135 visits since this date. According to Google Analytics, 76.55% (1,748,725 total visits) of traffic to LSE Research Online comes from searches. Only 16.13% of traffic is from referrals and 7.14% from direct traffic. We also use analog server statistics to monitor downloads and total downloads May 2007-Sept 2012 was 5,266,871.

Expectations of the survey

Before running the Majestic SEO report, I expected we would see plenty of traffic from Google and backlinks (i.e. incoming links) from as, understandably, these are key sources of traffic to LSERO and are indicated as such on Google Analytics. Google Analytics also points to referrals from Wikipedia and Google Scholar, and most recently, our Summon implementation which includes LSERO content. However, I was intrigued as to how LSERO would fare in an SEO analysis.

Majestic SEO survey results

The data was generated from Majestic SEO using a free account on 24th September 2012 using the ‘fresh’ index option. A summary of the results is shown below: there are 1,285 referring domains and 8,856 external backlinks. Note that the current findings can be viewed if you have a MajesticSEO account (which is free to obtain).

Figure 1: Majestic SEO analysis summary for

This includes 408 educational referring backlinks. If we look at backlinks in more detail, patterns begin to unravel:

Figure 2: Top 5 Backlinks

This illustrates a distinct majority of Wikipedia pages linking to LSERO content and yet this is only ranked as the sixth most popular source of traffic in Google Analytics.

Top referring domains, sorted by matched links, can be found in the table shown below:

Referring domains Matched links Alexa rank Flow Metrics
Citation flow Trust flow 14502 21 95 93 11239 5 97 94 349 8 97 98 272 33 98 96 225 1 99 99

Table 1: Top 5 Referring Domains

Flickr makes a surprise appearance, with WordPress and Blogger dominating the top of the table.

Top 5 items sorted by Majestic’s flow metrics can be found here:

Figure 3: Top 5 Resources in Repository (sorted by flow metrics)

Perhaps more indicative, the Top 5 linked resources sorted by number of backlinks can be found in the table shown below:

Ref no. URL Ext. BackLinks Ref. Domains CitationFlow TrustFlow
1 501 83 45 41
2 417 69 28 19
3 225 4 27 32
4 130 46 30 25
5 112 54 22 23

Table 2: Top 5 Linked Resources in Repository (sorted by no. of links)

These pages are:

  1. The LSE Research Online homepage.
  2. A PDF of a research paper on climate policy.
  3. The record for a paper on teenager’s use of social networking sites.
  4. The record for a paper on climate policy.
  5. The record for a paper on open source software.


Looking in more detail at the top backlinks to the repository, as listed in Figure 2, we can see that Wikipedia represents four out of five top pages. This includes the Wikipedia page on Free Software, which links back to a Government report on the cost of ownership of open source software. The Wikipedia pages on the European Commission and Proportional Representation are ranked second and third respectively. The Proportional Representation page links back to the full-text of a 2010 workshop paper: Review of paradoxes afflicting various voting procedures where one out of m candidates (m ≥ 2) must be elected. The fifth and only backlink not be Wikipedia is, an AIDS Education site which links back to the record of an early LSERO paper: Peer education, gender and the development of critical consciousness : participatory HIV prevention by South African youth.

In Table 1, the Top 5 Referring Domains to LSE Research Online are WordPress, Blogspot, Wikipedia, Flickr and Google. We can see the dominance of international social platforms here with WordPress (14,502 links) and Blogspot (11239 links), followed by Wikipedia (349 links), Flickr (272 links) and, finally a search engine, (225).

In Figure 3, Top 5 Resources in Repository (sorted by flow metrics), we can see several links to LSERO information pages including the home page and the feed of latest additions. There are, however, several direct links to full-text papers including an Economic History Working Paper on A dreadful heritage: interpreting epidemic disease at Eyam, 1666-2000Sorting this data by number of backlinks, as shown in Table 2, the top item is the LSERO homepage with 501 backlinks. The second item is the PDF of one of our most downloaded papers of all time: The Hartwell Paper.


So what can I draw from the results of the Majestic SEO report of LSE Research Online? Analysing the top referring domains according to the Majestic report, it seems reasonable to suggest that adding links to repository content on blogging platforms such as WordPress and Blogspot may result in an increased SEO ranking. We often link to LSERO content in various LSE Library blogs hosted on Blogspot, including New Research Selected by LSE Library. Flickr is also listed as a top referring domain according to the Majestic SEO but running a Google search for “” retrieves zero results. It’s difficult to ascertain how MajesticSEO gets this result when Google does not confirm the findings – perhaps it uses very different algorithms to Google? The MajesticSEO top referring domains indicate that blogging platforms are the main referring domains to LSERO content. However, according to our Google Analytics stats, 76.55% of traffic to LSERO is from searches. Furthermore, the Majestic report indicates that there are 349 matched links to LSERO content on Wikipedia. “Running the search “” in you get (on 11 October 2012) “About 92 results”. From the last page of the results, by repeating the search to include omitted results, Google ends up with 80 hits.” Searching for in retrieves 83 hits. How does MajesticSEO retrieve such varying results?

Looking at backlinks, it’s important to note that the majority of top backlinks refer to papers that have the full-text attached and often link directly to the full-text PDF, of course resulting in a direct download. In addition, the Top 5 Resources in Repository (sorted by external backlinks) as seen in Table 2 tallies with our consistently popular papers according to Google Analytics and our analog statistics.

It is apparent that the inclusion of repository links on domains such as Wikipedia and blogging platforms appears to have a positive impact in helping the relevancy ranking weighting for LSERO content in web pages. This is not to mention direct hits on the links themselves, adding directly to the site’s visitors, and thus the dissemination of LSE research outputs. However, whether we can draw firm conclusions from the Majestic report remains to be seen, particularly with such differing results to those found on Google.

Thanks to my colleague Peter Spring for his advice when writing this post.

About the Author

Natalia Madjarevic is the manager of LSE Research Online, LSE Theses Online and LSE Learning Resources Online, the repositories of The London School of Economics and Political Science.

Natalia is also the Academic Support Librarian for the Department of Economics and LSE Research Lab. Joining LSE in 2011, prior to that Natalia worked at libraries including UCL, The Guardian and Queen Mary, University of London. Her professional interests include Open Access, research support, REF, bibliometrics and digital developments in libraries.

Posted in Evidence, Guest-post, Repositories | 4 Comments »

SEO Analysis of WRAP, the Warwick University Repository

Posted by ukwebfocusguest on 23 Oct 2012

SEO Analysis of a Selection of Russell Group University Repositories

A post published in August 2012 on an MajesticSEO Analysis of Russell Group University Repositories highlighted the importance of search engine optimisation (SEO) for enhancing access to research papers and is part of a series of articles on different repositories and provided summary statistics of the SEO rankings for 24 Russell Group University repositories.

This work adopted an open practice approach in which the initial findings were published at an early stage in order to solicit feedback on the value of such work and the methodology used. There was much interest in this initial work, especially on Twitter. Subsequent email discussions led to a number of repository managers at Russell group universities agreeing to publish more detailed findings for their repository, together with contextual information about the institutional and the repository which I, as a remote observer, would not be privy too.

We agreed to publish these findings on this blog during Open Access Week. I am very grateful to the contributors for finding time to carry out the analysis and publish the findings during the start of the academic year – a very busy period for those working in higher education.

The initial post was written by Yvonne Budden, the repository manager for WRAP, the Warwick Research Archives Project. It is appropriate that this selection of guest blog post begins with a contribution about the Warwick repository as Jenny Delasalle, a colleague of Yvonne’s at the University of Warwick and myself will be giving a talk on “What Does The Evidence Tell Us About Institutional Repositories?” at the ILI 2012 conference to be held in London next week.

SEO Analysis of the University of Warwick’s Research Repositories

The following summary of a MajesticSEO survey of the University of Warwick’s research repositories, together with background information about the university and the repository environment has been provided by Yvonne Budden.

A Little Background on Warwick

The University of Warwick is one of the UK’s leading universities with an acknowledged reputation for excellence in research and teaching, for innovation and for links with business and industry. Founded in 1965 with an initial intake of 450 undergraduates, Warwick now has in excess of 22,000 students and employs close to 5,000 staff. Of those staff just fewer than 1,400 are academic or research staff. Warwick is a research intensive institution and our departments cover a wide range of disciplines, including medicine and WMG, a specialist centre dedicated to innovation and business engagement. In the 2008 RAE nineteen of our departments were ranked in the top ten for their unit of assessment and 65% of the submitted research outputs were ranked 3* or 4*.

University of Warwick’s Research Repositories

Warwick’s research repositories began in the summer of 2008 with the Warwick Research Archives Project (WRAP), a JISC funded project that created a full text, open access archive for the University. WRAP funding was taken by the Library and in April 2011 we launched the University of Warwick Publications service, which was designed to ‘fill the gaps’ around the WRAP content with a comprehensive collection of work produced by Warwick researchers. The services work on the same technical infrastructure but WRAP remains distinct and exposes only the full text open access material held. The system runs on the most recent version of the EPrints repository software, using a number of plugins for export, statistics monitoring and most recently to assist in the management of the REF2014 submission. To date we do not have a full text mandate for WRAP and engagement with both WRAP and the Publications service varies across the departments. Deposit to the services is highly mediated through the repository team and so engagement is not necessarily reflected in the number of papers available per department, especially as some departments benefit more from the service’s policy of pro-active acquisition of new material where licenses allow. I would judge that our best engagement in terms of full text deposit comes from Social Science researchers but we also have some strong champions in the Medical School, History, Life Sciences and Psychology.

Size and Usage Statistics

At the end of August 2012 WRAP contained 6,554 full text items covering a range of item types, journal articles, theses, conference papers, working papers and more. The Publications service contained a further 40,753 records. In terms of usage since its launch the system has seen 900,997 visits according to Google Analytics, an average of just over 18,000 a month in the 50 months active. To track downloads we use the EPrints plugin, IR Stats, this counts file downloads either directly or through the repository interface. IR Stats will only count one download per twenty-four hours from each source, but will count multiple downloads if an item has multiple files attached. Over the life of WRAP the files held have been downloaded a grand total of 730,304 times with 49.08% of downloads coming from Google or Google Scholar.

Expectations of the Survey

Going into the survey using the MajesticSEO system wasn’t sure what to expect from the results, the majority of the work we’ve done so far with the statistics is with the Google Analytics and the IR Stats package. Looking at the referral sources in the our Google output I can indicate a number of sources I might expect to see back links into the system, including our Business School ( and the Bielefeld Academic Search Engine(BASE) as well as a number of smaller sources. The Warwick Blogs service seems to have fallen out of favour over the past few years with the number of hits from there dropping as people move to other platforms. Above all I’m most curious to see if the SEO analysis can help with the work I am doing in promoting the use of WRAP and the material within it. If this work can assist me in creating the kinds of ‘interest stories’ that help to persuade researchers to deposit it could become another valuable source of information. We are also looking at expanding the range of metrics we have access to, looking at the IRUS project as well as the forthcoming updated version of IR Stats, recently demonstrated at Open Repositories 2012.

Our Survey Results

The data for this survey was generated on the 10th September 2012 using the ‘fresh index’ option, although the images were captured on 19 October. The current results can be found if you have a MajesticSEO account (which is free to obtain). The summary for the site is given below showing 413 referring domains and 2,523 backlinks.

Figure 1: MajesticSEO analysis summary for

On first glance this seems to be rather low in terms of backlinks, it also shows a fairly low number of educational domains linking to us. The top five backlinks in to the system can be seen below, ranked as standard by the system by a combination of citation and trust flow:

Figure 2: Top 5 Backlinks

Interestingly this lists some of the popular referrers we see in Google Analytics driving traffic to us, but not some others I might have expected to see. The top referring domains are shown below:

Figure 3: Top Referring Domains

This is the only place in the results where Google features at all. The top five pages, as ranked by the flow metrics show a fairly distinct anomaly, as two of the pages are not listing any flow metric information despite this supposedly being the method by which they are ranked:

Figure 4: Findings Ranked by Flow Metrics

The top five pages as sorted by number of backlinks can be seen in the table below:

Ref No. URL Ext. Backlinks Ref. Domains Citation Flow Trust Flow
1 228 1 14 0
2 177 23 37 37
3 91 31 15 13
4 82 4 11 9
5 46 4 17 2

Table 1: Top 5 Pages, Sorted By Number of Links

These five items are as follows:

  1. A research paper on the impact of cotton in poor rural households in India.
  2. The WRAP homepage.
  3. A PDF of an economics working paper on currency area theory.
  4. A PDF of an economics working paper on happiness and productivity.
  5. The record for a PhD thesis on Women poets.


The top ten backlinks into the WRAP system include a range of sources, from this blog, two Wikipedia pages and two referrals from the PhilPapersrepository, which monitors journals, personal pages and repositories for Philosophy content. We also see a two of pages that collect literature on health topics who are linking back to us, a Maths blog and the newsletter of the British Centre of Science Education.

Interestingly in Figure 3 there is no mention of the University of Warwick or any of its related domains ( for the Business School, for instance). I assume this is because MajesticSEO are excluding ‘self’ links, so as WRAP is a Warwick subdomain they are excluding a lot of the links I am aware of. This may also take into account the lack of any backlinks from the Warwick Blogs service. Many of the domains listed here are blog platforms of one form or another, which may be because of the database driven architecture of these platforms and the way the MajesticSEO system are reading those links. For example, if a researcher puts a link to his most recent paper in WRAP on the frame of the blog and this propagates onto every post in the blog, does this count as a single link or as many? We are also seeing links from sources such as the BBC and Microsoft, where, again, it would be nice to be able to see who was linking to what and from where in these domains.

The top pages, as listed by number of backlinks in Table 1, show a trend for linking directly to the file of the full text material we hold in WRAP. This information would tie in nicely with the fact that item three is the most downloaded paper in WRAP over the lifetime of the repository, with 9,162 downloads to the end of August 2012. So in this case we can draw a tentative line between the number of downloads and the number of backlinks. However we can’t follow this theory through, especially as the top paper linked to externally, Paper 1 as listed in Table 1, has been downloaded only a fraction of the number of times compared to the currency working paper. When listed by the flow metrics, as in Figure 4 the pages largely follow the results as seen for the Opus repository at Bath and link to pages about the repository. This is apart from the two anomalous results where despite having no citation or trust flow scores they are ranked second and third, when ranked on flow metrics.


I think when looking at metrics the most important thing for a repository manager to do is to be able to build stories around the metrics, as these help the researchers to engage with the figures. Was this spike in downloads because of featuring in a conference, or an author moving to a new institution, or for some other reason? What can I show my users that are going to help them to make the decision to use us over other options and to expend scare time resources maintain a blog or Twitter account? Here the issue, I have with the data we have discovered is that the number of backlinks into a repository will never conclusively prove that a paper will get more downloads, as ably illustrated by the example above. Many researchers are not interested in the fuzzy conclusions we can draw at this point; they want to see clear, conclusive proof that links = downloads = citations.

I also think that search engine performance is an increasingly difficult area to be really conclusive about, especially now users can ‘train’ their Google results to prefer the links they click on most often. This was recently a cause of concern for us as it was reported that our Department of Computer Science (DCS)’s EPrints repository was overtaking our Google ranking and that WRAP didn’t feature until page two of the results now. This wasn’t the case, but because the user reporting this to us was heavily involved in the area of computer science his Google rankings had preferred the DCS repository to the WRAP one as the results were more relevant to his interests. In the same was as when I search for ‘RSP’ my top result is now the Repositories Support Project and not, RSP the Engineering Company or the Peterborough Health and Safety firm as it was initially

We need to always be conscious of what the researcher want from metrics and whether it is possible for us to give it to them. As with any metrics we need to be aware that we have to be explicit in what it is that we are saying and what can be inferred by it. If we are users of metrics don’t understand how the metrics are being developed or how the search engines ranking algorithms work, we won’t be able to confidently predict what we can do to improve them. It may also come down to the way researchers are using these services and for what purpose, which may be why we are not seeing any evidence of the use of services like and LinkedIn. I would imagine if researchers are using services to showcase their work to prospective employers and other researchers they may prefer to link to the publisher’s version of their work rather than the repository versions. I suspect the interest story from the SEO data may be more about ‘who’ is linking to their work rather than where they are linking from, which is detail we cannot and possibly should not be able to provide.

About the Author

Yvonne Budden (@wrap_ed), the University of Warwick’s E-Repositories Manager is responsible for WRAP, the Warwick Research Archive Portal and is the current Chair of the UK Council for Research Repositories (UKCoRR).


Posted in Evidence, Guest-post, Repositories | 3 Comments »

Open Practices for the Connected Researcher

Posted by Brian Kelly on 22 Oct 2012

Today sees the start of Open Access Week, #OAWeek. As described on the Open Access Week Web site:

Open Access Week, a global event now entering its sixth year, is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned with colleagues, and to help inspire wider participation in helping to make Open Access a new norm in scholarship and research.

I am participating in Open Access Week by sharing my experiences of making use of the Social Web to maximise access to papers hosted in institutional repositories. Tomorrow (Tuesday 23 October 2012) I am giving a talk on “Open Practices for the Connected Researcher” in a seminar which is part of a series of Open Access Week events which are taking place at the University of Exeter.

On Thursday, as described in a news item published by the University of Salford, I am the invited guest speaker for an Open Access event which will take place at the  Old Fire Station at the University of Salford where I will give a talk on “Open Practices and Social Media for the Connected Researcher“.

The following day I will be giving a talk on “Open Access and Open Practices For Researchers” at the University of Bath. This event, which marks the launch of a Social Media programme for Researchers, will include a presentation from Ross Mounce, a PhD student and Open Knowledge Foundation Panton Fellow at the University of Bath, who will talk about the need for true Open Access (as originally defined), why it matters and the plethora of options we have for OA publishing in addition to my talk.

In addition to such ‘real-world’ activities in support of Open Access Week I am also taking part in the Networked Researcher Blogging Unconference and earlier today published the launch post for the unconference.

My slides for tomorrow’s talk are available on Slideshare and are embedded below.

Posted in openness, Repositories | Tagged: | 3 Comments »

“If a Tree Falls in a Forest”

Posted by Brian Kelly on 6 Sep 2012

If a paper is deposited in an institutional repository and nobody notices, can the associated work be seen to have any relevance? I wondered about this recently after looking at the download statistics for my papers hosted in Opus, the University of Bath repository. Normally I’m interested in the reasons for popular downloads (such as the evidence that this might suggest that the large numbers of downloads are due to the ‘Google juice’ provided by links from popular Web site). However as part of the preparation for a talk on “Open Practices for the Connected Researcher” I’m giving at the University of Exeter during Open Access Week I was interested in lessons to be learnt from papers which hardly anyone downloads.

In my case the papers nobody cares about are an article published in LA Record in 1997, a paper on Collection Level Description also published in 1999 which I had forgotten about until I rediscovered it a few years ago and uploaded to the repository, the final report for the QA Focus project and a peer-reviewed paper on Using Context to Support Effective Application of Web Content Accessibility Guidelines.

It was the peer-reviewed paper I was most interested in. This paper, written by myself, David Sloan, Helen Petrie, Fraser Hamilton and Lawrie Phipps and published in the Journal of Web Engineering (JWE), has only been downloaded twice. Clearly nobody is being deafened by the impact of this paper challenging the status quo!

Given that a total of 13,104 papers of mine have been downloaded from the repository what are the reasons for the lack of interest in this paper?

The obvious starting point would be the content. But this paper was a follow-up from previous papers on Web accessibility which have been well-read and widely-cited and the interest in our papers in this area has continued.

Looking at the email folder about this paper it seems that the first version of the paper was submitted to the publishers in July 2005. I seem to recall that we were invited to submit a paper based on an updated version of a paper on Forcing Standardization or Accommodating Diversity? A Framework for Applying the WCAG in the Real World by the same authors which had been presented at the W4A 2005 conference.

We received positive comments from the reviewers in August 2005 and responded with appropriate updates to the paper. But then everything went quiet. It wasn’t until August 2006 when we received the final proofs of the paper and September 2006 when we received confirmation that the paper had been accepted and the paper had published in the Journal of Web Engineering, Vol. 5 No. 4 in December 2006. This was 17 months after we had submitted the first version of the paper!

By this time myself and my co-authors had forgotten about the paper, and the ideas we described had been superceded by a paper on Contextual Web Accessibility – Maximizing the Benefit of Accessibility Guidelines presented at the W4A 2006 conference in May 2006.

Looking at the download statistics for my papers it seems that I began depositing items in the Opus repository in October 2008. My first set of papers were deposited by repository staff based on the links available from the UKOLN Web site. However it would appear that the JWE paper had not uploaded, probably because I had failed to include it in my list of publications due to its long gestation period. A few months ago I noticed that the paper had not been uploaded to the repository so on 17 May 2012 I uploaded the paper.

The reason for the lack of downloads is now clear: the paper wasn’t available until recently! And by the time the paper was available the ideas were no longer current.

What are the lessons which can be learnt which I can share in my talk on “Open Practices for the Connected Researcher“? I would suggest:

Repository items need to be made publicly available when the ideas are current. Depositing old papers may be useful for preserving the content and for record-keeping purposes, but not if the aim is to maximise the impact of the ideas.

Of course there is a bigger question about the value of peer-reviewed papers. In his 1,000th blog post Tony Hirst gave his reflections on The Un-academic. Tony pointed out that “Formal academic publications are a matter of record, and as such need to be self-standing, as well as embedded in a particular tradition” and contrasted this with blog posts which are “deliberately conversational: the grounding often coming from the current conversational context – recent previous posts, linked to sources, comments – as well as discussions ongoing in the community that the blog author inhabits and is known to contribute to“.

Tony argued the value of blogs in the support of the research process by point out blog posts can provide:

“a contribution to a daily ongoing communication with a community that often mediates its interests through the sharing of links (that is, references); in part it’s a contribution of ideas at a finer resolution than a formal academic reference, and in completely different style to them, to the free flow of ideas that can be found through the searchable and sharable world wide web.

Since 2005 myself and my colleagues have had peer-reviewed papers published at the W4A conferences in 2005, 2006, 2007, 2008, 2010 and 2012. This is part of the “annual ongoing communication with a community that often mediates its interests through the sharing of links (that is, references)”. However sometimes this process goes wrong, as has been described in this post. Although the problems associated with the long time frames it can take for research work to be published this doesn’t mean that the process of research publications is fundamentally flawed. However I think this example does illustrate the need for researchers to make “contribution to a daily ongoing communication with a community that often mediates its interests through the sharing of links“.

Tony’s blog post concludes by referencing a number of recent posts by Alan Levine (@CogDog) in which he has shared his thinking on blogging: The question should be: why are you NOT blogging?Every box you type in can be a doorway to creativity, and in a roundabout way, Gotta know when to walk. Alan’s first post provides his reflections on his blogging activities since he started 0n 19 April 2003. This long post is worth reading, but can be summarised very succinctly:

So here is why I blog. It is foolish and informationally selfish, not to.

Perhaps that should be the key message I give in my talk in Exeter during Open Access Week. Oh, having reflected on the paper which nobody reads I have decided that if a peer-reviewed paper is not read, this is a failure. My time and the time spent by my co-authors in writing the paper could have been more productively spent on other work. And no, unlike blog posts in which writing ideas may be a useful process in itself, peer-reviewed papers aren’t intend to assist in self-reflective.

Twitter conversation from Topsy:  [View]

Posted in openness, Repositories | 1 Comment »

MajesticSEO Analysis of Russell Group University Repositories

Posted by Brian Kelly on 29 Aug 2012

Investigation of SEO Rankings of Institutional Repositories

There is a need “to investigate whether links [from popular social media services] are responsible for enhancing SEO rankings of resources hosted in institutional repositories” concluded the paper by myself and Jenny Delasalle which asked “Can LinkedIn and Enhance Access to Open Repositories?“.

The importance of SEO rankings for surfacing content hosted in institutional repositories can be gauged from the responses to the query I asked on the JISC-Repositories JISCMail list: “Does anyone have any statistics on the proportion of traffic which arrives at institutional repositories from Google?”. I asked a similar question on Twitter and found that mature research repositories seem to get about from 50-80% of their traffic from Google. This aligns with the findings reported by Les Carr for the University of Southampton back in 2006: “the majority of repository use, if I can equate eprint downloads with repository use, is due to external web search engines (64%)“. Indeed since it has been reported that direct downloads of PDFs hosted in repositories may not be reported unless Google Analytics has been configured appropriately such figures may be an underestimate!

In light of the importance of Google in supporting repositories in their mission of making research papers easily accessible to others it will be useful to gain a better understanding of the factors which contribute to supporting the discoverability of the content hosted in institutional repositories.

The survey described in this post reports on summary SEO findings for the 24 Russell Group universities. The aims of the survey are to provide a benchmark for comparisons with surveys which may be carried out in the future, to attempt to identify any interesting usage patterns which may help to enhance the effectiveness of institutional repositories and to identify the highest ranked domains which provide links to institutional repositories.

Survey Using MajesticSEO

The data was collected on 27-28 August 2012 using the MajesticSEO service. Note that the current finding can be obtained by following the link in the final column. The findings can be viewed if you have signed up to the free service.

Table 1: MajesticSEO Findings for Repositories Hosted at Russell Group Universities
Institutional Repository Details Referring
Top Five Domains & Numbers of Links View Results
Repository usedeprint Repository
 116  499  146  16 6,424 4,658 200 82 67
InstitutionUniversity of Bristol
Repository used: ROSE
 159  691 144  21 7,871 6,692 273 98 89
Repository usedDspace @ Cambridge
  86 7,339  283  97 33,276 17,241 1,771 449 442
InstitutionCardiff University
Repository usedORCA
   22     58     9    4 1,874 883 250 85 60
InstitutionUniversity of Durham

Repository usedDRO

297 1,281   27   12 5,430 3,020 145 76 45
Repository used: ERA
747  3,943  247  71 14,380 9,845 470 401 296
InstitutionUniversity of Exeter
Repository used: ERIC
Note: Repository sub-domain not used. See footnote 2.
198   958  175   18 1,125 1,115 45 43 42
InstitutionUniversity of Glasgow
Repository usedEnlighten
 4,868 423  62 5,880 5,087 322 178 135
InstitutionImperial College
Repository usedSpiral
 139  702 329  11 3,363 1,883 121 119 65
 37 2,552 2,275 169 160 139
InstitutionUniversity of Leeds
Repository usedWhite Rose Research Online
 700 4,847 1,354    2 44 23 13 8 5
Repository usedResearch Archive
 297   147    8 4,057 2,461 97 55 53
Repository usedLSE Research Online
 1,365 9,771  549   80 14,449 11,550 343 262 244

Repository usedeScholar

Note: Repository sub-domain not used. See footnote 3.
 (5)  (29)  – [Link]
InstitutionNewcastle University

Repository usedNewcastle Eprints

 30  215  85    5 6,425 3,929 221 116 87
Repository usedNottingham Eprints
 359 1,594 328   57 5,410 3,856 148 77 66
InstitutionUniversity of Oxford
Repository usedORA
 299  1,116  94  35 42,008 39,798
1,437 548 504
Repository used: QMRO
  27  449  350   6 4,722 1,221 259 219 89

: Repository sub-domain not used. See footnote 4.
 (9)  (14)  –  – [Link]
Repository used: DCS Publications Archive

Note: Repository sub-domain not used. See footnote 5.

Note: The University of Sheffield also uses the White Rose repository which is also used by Leeds and York. See the Leeds entry for the statistics.

 (2)   (3)  –  –  [Link]
Repository usedeprints.soton
46,176 33,524 123 4,384 2,568 264 138 89
Repository usedUCL Discovery
 13,978 492   24 16,009 15,633 860 406 250
InstitutionUniversity of Warwick

Repository usedWRAP

 2,476 278    20 9,412 7,601 217 179 122
InstitutionUniversity of York
Repository used: YODL
Note: Repository sub-domain not used. See footnote 6.
Note: The University of Sheffield also uses the White Rose repository which is also used by Leeds and York. See the Leeds entry for the statistics.
 (3)  (5)  –  –  [Link]
Range  14 – 1,369  37 – 46,176  9 – 33,524  2 – 123


  1. The list of repositories is taken from OpenDoar.
  2. The ERIC repository at the University of Exeter is hosted at Since the repository home page is a redirect from it was possible to analyse the SEO rankings and get appropriate results.
  3. The eScholar repository at the University of Manchester is hosted at  Figures for this home page are given but since the domains with incoming links may refer to pages hosted on the domain, these figures are not given in order to avoid skewing the findings.
  4. The Queen’s University Belfast repository is hosted at Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the domain, these figures are not given in order to avoid skewing the findings.
  5. The DCS repository at the University of Sheffield is hosted at Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the domain, these figures are not given in order to avoid skewing the findings.
  6. The YODL repository of the University of York is hosted at Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the domain, these figures are not given in order to avoid skewing the findings.

Table 2 gives the total number of links to the high-ranking domains which are listed in the survey, together with the Alexa ranking for these domains. Note has the highest Alexa ranking and is listed at number 1. Figure 1 shows the significance of links from blog platforms compared with the other most highly-ranked domains.

Figure 1: Histogram of number of incoming links from top domains

Table 2: Nos. of Links from High-Ranking Domains
No. Domains No. of links Alexa Ranking
1 Blogspot  176,625       5
2 WordPress  153,809     21
3 Wikipedia     7,230       8
4 BBC     2,811     36
5 Google    1,447       1
6 Ask       769     46
7 YouTube       460       3
8 Guardian       334    187
9 Reddit       261    143
10       259    259
11 Typepad       250   212
12 CNN      135     43
13 Microsoft       89     26
14 Sourceforge       67    139
15 Ning       42    256
16 Oxford University         5 6,764


In a previous post I suggested that since is so widely used across Russell Group Universities, encouraging researchers to provide links to their papers hosted in their institutional repository would enhance the visibility of papers to Google, especially since LinkedIn has such a high Alexa ranking (it currently is listed at number 13 in the global ranking order).

However it appears that LinkedIn does not appear to have a significant presence according to the findings provided in MajesticSEO (although the free version does only list the top five domains).

Based on the information obtained in the survey it would appear that two blog platforms, and, are primarily responsible for driving traffic to institutional repositories, having both high Alexa rankings together with large numbers of links to the repositories.

Following these two platforms, but a long way behind, we find Wikipedia and the BBC and then, perhaps somewhat confusingly, Google itself (perhaps links from Google Scholar). The presence of media sites such as the BBC, CNN and the Guardian suggest that researchers (or their media advisers) are doing a good job in ensuring that these organisations provide links to original research papers when stories about university research are being covered in the media.

But perhaps the most noticeable findings is that only one University Web site – Oxford’s – is included in the list of the top 5 domains across all of the Russell Group Universities. The low Alexa ranking (6,764) for the Oxford University Web site in comparison with the other sites listed (which have an Alexa ranking ranging from 1 to 259) suggests that links from university Web sites, even prestigious universities such as Oxford, will not have a significant impact on Google search results. It should also be noted that links from the University of Oxford Web site will not provide SEO benefits to the University of Oxford’s repository, which is hosted in the same domain (

Limitations of this Survey

It should be noted that these conclusions are based on just one SEO tool and only a small selection of the findings are available. A more comprehensive survey would make use of the licensed version of the service, and make use of other SEO tools to compare the findings.

In addition Google do not publish the algorithms on which their search results are ranked so there can be no guarantee that the findings provided by SEO tools will relate directly to users experiences of using Google.

In order to relate these findings to the ways users access resources hosted on a repository there will be a need to examine usage statistics for repositories. It would be interesting to see if the downloads for the most popular items show any correlation with links from the services listed above.

Survey Paradata: The findings given in Table 1 were collected on 27-28 August 2012 using the free version of MajesticSEO. The Alexa rankings listed in Table 2 were obtained from the Alexa survey and collected on 28 August 2012. Where the findings from MajesticSEO were incomplete, due to the repository not being hosted on the root of a repository sub-domain this information was recorded and any data collected was not included in further analysis.

Twitter conversation from: [Topsy] – [SocialMention] – [WhosTalkin]

Posted in Evidence, Repositories | 15 Comments »

Where Do You Go To (My Lovely)?

Posted by Brian Kelly on 24 Aug 2012

Where Do Visitors To This Blog Go To?

Where do visitors to this blog go do when they click on a link which published in a blog post? When I looked at the click statistics for the past year I was surprised that the top ten pages, with just one exception, were to the home page of a number of UK Universities: Abertay, Aston, Cambridge, Bangor, Buckingham, Glasgow, ECA, Exeter and Falmouth. I subsequently found that these were the nine of the 26 institutions which had been hyperlinked in a post on Best UK University Web Sites – According to Sixth Formers published in 2010.

Apart from the links followed from this single post the other top web sites visited in the past year are:,,,, and

What does this evidence tells us? Suggestions for the popularity of these Web sites are given below: I normally provide a link to tweets which I cite. This enables me to find the original source if I wish to make use of it in the future. In addition it will help people reading the post to see the source, see the context and find out more about the Twitter user. It would appear that my decision to do this has proved useful as people do seem to be clicking on links to tweets. This initially appeared to be an anomaly. However I subsequently realised that a post giving Thoughts on Google Scholar Citations published a few days after Google’s announcement that Google Scholar Citations Open To All had proven very popular after I had left a comment linking to the post on Google’s blog post. Scholar It is pleasing to see that links to Opus, the University of bath’s institutional repository, features so highly. These are primarily to copies of my peer-reviewed papers. Interestingly a recent paper by myself and Jenny Delasalle asked Can LinkedIn and Enhance Access to Open Repositories? Although we feel the answer is “yes” it would appear that this blog also has a significant role to play in enhancing access to such papers. As might be expected there are significant numbers of visits to the Web site for UKOLN’s annual Institutional Web management Workshop, IWMW, since the event is featured on this blog when we issues the call for submissions, when we open the event for bookings, and when we publish reflections on the event. The reason for the significant number of visits to the Computer Weekly Web site is simple: they will have read the post in which I announced that this blog had been short-listed for the Computer Weekly’s IT Blogger of the Year award. Since I was the runner-up I know that large numbers must have followed the link and voted for this blog:-) As might be expected there are significant numbers of visits to the UKOLN Web site which hosts many of the resources for work which I write about on this blog.

Redesign of the Blog’s Sidebar

It should be noted that visitors do not only follow links provided in blog posts; the blog’s sidebars and navigation bar also provide addition content and links to resources.

Sometime ago I came across Markosweb which provides information about Web sites including the UK Web Focus blog. I was particularly interested in the heat map for the blog. As described on the Web site:

Heatmap – An F-shaped principle of how web-pages are read: two horizontal strips and one vertical. Using this principle we’ve suggested where your visitors’ eyes will first be directed to on the main page.

This data can help you in placing the most important site’s blocks in the hottest places. This will help you to increase the site’s traffic and raise profitability.

The left hand sidebar provides information about the blog which I feel is important information. However, as shown in the accompanying image of the heat map for a previous design of the blog, although the blog’s search box is likely to be used by people which wish to search for additional posts, the email subscription sign-up area was a waste of space, as this is something people will only do once, if at all.

In light of the suggestion that the heat map can help be to locate important content I updated the design of the sidebar in March 2012. The blog now has a Featured Paper area beneath the search box (as illustrated) which summarises a paper and provides links to the paper. The featured paper is updated every couple of weeks.

It was not clear to me whether the redesign had any effect on users’ behaviour. Having for the first time analysed the statistics for users clicks it would appear that this redesign has helped to raise the visibility of my papers (it should be noted that the clicks may also have come from links provided in blog posts) .

What Does the Evidence Tell Us?

Myself and Jenny are presenting a talk at the Internet Librarian International (ILI 2012) conference to be held in London on 30-31 October which will try to provide an answer to the question: What does the evidence tell us about institutional repositories? The evidence from analysis of the blog’s statistics tells us that the blog delivers significant traffic to the University of Bath’s repository. Given the significant relationship between this blog and the Opus repository it will be interesting to see if the links from this blog have any impact on the repository’s search engine rankings and the visibility of the repository itself, as well as my papers, for researchers who make use of Google to search for relevant information.

Perhaps my post which asked Can LinkedIn and enhance access to open repositories? which was republished yesterday on the LSE Impact of Social Sciences blog gave an incomplete view of the importance of social media for researchers seeking to maximise the impact of their work. Maybe it would be a mistake to ignore the importance of researcher’s blog, not just as an open notebook for sharing ideas at an early stage and inviting feedback, but to support the dissemination of existing published work?

Twitter conversation via Topsy: [View]

Posted in Evidence, Repositories | 1 Comment » Announces Analytics! But How Should Researchers Interpret the Findings?

Posted by Brian Kelly on 16 Aug 2012

Catching up with overnight tweets on a wet morning at the bus stop

At 7.30 am I was waiting in the rain for my bus to work. As normal I was catching up with the tweets I’d received overnight and had downloaded to my iPod Touch before leaving home. One of the tweets which was particularly interesting was from @KeitaBando. I met Keita, Digital Repository Librarian and Coordinator for Scholarly Communication for the My Open Archive service, at the Open Repositories OR 2012 conference recently, following his poster presentation on Current and Future Effects of Social Media-Based Metrics on Open Access and IRs. Keitatweet announced news of relevance to many who attended the OR12 conference: Blog: Announcing @academia Analytics

Since one of the papers I had submitted to the OR 2012 conference asked “Can LinkedIn and Enhance Access to Open Repositories?” this announcement was of particular interest to me.

The blog post Announcing Analytics described how:

Today we are announcing the release of’s Analytics Dashboard [which] allows academics to view the real-time impact of their research.

The development is based on the changing environment provided by the Web:

Increasingly, the primary consumption experience for scientific content is the web, and yet scientists have not generally been aware of the metrics around this consumption. If you ask a Harvard biology professor with 200 publications how many downloads she experienced in the last 30 days, typically she will not know.’s Analytics Dashboard is changing this. It allows an academic to understand in sophisticated detail how their research is being used by the academic community. It shows them countries that are sending them the most traffic, search engines and other sites that are sending them the most traffic, and overall profile views and document views. 

What does the new service tell me about my papers? It seems that on 11 August 2012 there were 5 views of my items available on and over the last 230 days there had been a total of 9 views of information about my papers and 11 views of my profile on

Since the analytics service “allows academics to view the real-time impact of their research” we can explore the individual visits:

and then no other activities until 22.00 on 11 August when someone from Argentina read information about the paper on Open Metrics for Open Repositories.

Clearly such numbers are underwhelming! This would therefore seem to provide evidence which suggests that the question Jenny Delasalle and myself posed in our paper “Can LinkedIn and Enhance Access to Open Repositories?” would be “No” in the case of

Since the metadata I have uploaded to provides a link to papers hosted on Opus, the University of Bath repository, it will be interested to make comparisons with the numbers of downloads of papers hosted on Opus over a similar period.

Since the Opus service provides statistics on a monthly basis it was not possible to make a direct comparison. However looking for the download statistics for my papers during July 2012 I found that there had been a total of 679 downloads with the top two downloads which, as might be expected, were of my most recent two papers, having been downloaded a total of 184 times.

From these personal experiences we might conclude that is not a significant driver of traffic to my papers and it might therefore be questionable as to whether it is worth creating a profile in the service and adding links to one’s papers. I think it would be a mistake to draw such conclusions, for the following reasons:

  • These experiences may not be replicated by others.
  • I have chosen to replicate my research profile across a number of services, including MendeleyLinkedIn and ResearcherGate as well as I would expect some of these services to be widely used, while others are less-well used.
  • Using a variety of researcher profiling services with links to my papers will enhance the ‘Google juice’ for the papers (and the repository). Use of these services can therefore enhance the discoverability of the papers for people who use Google – and this is likely to be the majority of people!

I’d be interested to hear about other people’s experiences of Is anybody finding that their pages on the service are being well-used?

Twitter conversation from Topsy: [View]

Posted in Evidence, Repositories | 1 Comment »

Pure and Impure Thoughts

Posted by Brian Kelly on 25 Jul 2012

Thoughts on the Pure CRIS

The University of Bath recently announced that “the University’s new Current Research Information System, Pure is now available to all academic and research staff“.  The announcement went on to describe how:

Pure provides a single location for staff to store information about their research, such as publications, collaborations, research projects and grants etc and the associations between them. Having been entered into Pure once, data can be used for a variety of purposes, including creating CVs and bibliographies and later this year automatic population of personal web pages. Pure is designed to make it as easy as possible to keep information about research up-to-date, providing ongoing visibility of research activities at the University.

As soon as I saw this announcement I logged on to the service and viewed my papers, which had been deposited using the University of Bath’s ePrint’s service, Opus.

I have to admit that I was impressed with the interface, which provides a much cleaner interface that the Opus interface I have had to use previously.  In addition to the listing of my papers, illustrated, the editing interface was also much easier to use and, as illustrated, I am able to update the metadata from a single page – a simple task which was  cumbersome when I had to use the ePrints service.

In addition to the simple list display of my papers as illustrated there is also an option to view a graph of connections with co-authors.

From my initial use of Pure I felt that the service provided a valuable development to the University’s ePrints service, with improved editing and display features.

I was very pleased with the service and was glad that I had chosen to use it as soon as I saw the announcement that the service had been launched.

Impure Thoughts

Quality Validated Metadata or Instantly Updated Access to Content?

Further use of Pure, however, revealed a number of limitations. The reverse date order display of items within years is a minor glitch (as shown in the initial image, my first paper published in 2012 is displayed after two other papers published this year. However  I have to admit that I was annoyed when I found that  items I had edited were deleted from Opus, with a 404 Item not found error message being displayed. It seems that items are deleted if they are edited and are not available until edits have been validated. In my case, there was a delay or several days before the items were retrieved due to a combination of annual leave and sickness.  However this seems t0 me to be an inappropriate policy decision especially, as in my case, the items were new and, during this period, are more likely to be read.  I’m pleased that my concerns have been acknowledged by Bath repository staff who have agreed to revisit this policy. I am highlighting this issue here as it appears likely that others may well encounter the tension between the repository managers’ desire to ensure that they possess high quality validated metadata (especially in the run-up to the REF) and the desire for researchers to be able to maximise access to their research. Such concerns were highlighted in a recent post on the JISC-Repositories JISCMail list when Stephan Harnard argued that:

What OA IRs need today, urgently, is not cataloguers to monitor quality, nor IP specialists to monitor rights, etc. etc. No intermediary is needed between the author and the IR “monitor”, retard, block or otherwise impede deposits (though help is always welcome to encourage depositors and facilitate and speed their deposits!).

What OA IRs need urgently today instead of needless, costly and counterproductive monitoring and mediation is e*ffective Green OA **mandates (ID/OA)*. That is what will generate deposits (and further minimize the negligible cost per paper deposited).

The problem of IRs today is not fraudulent researchers depositing bogus content, it is legitimate researchers failing to deposit OA’s target content (refereed research publications).

Private or Public Content?

The issue of rapidly updated versus validated content is one topic for discussion with colleagues at Bath. However the feature which surprised me most was that the information about my papers is only available to me.  In retrospect I should have realised that the prime function of a CRIS) (Current Research Information System) is for internal management and reporting purposes. As the Pure web site describes:

[Pure] covers Grant applications, Research Income, Projects, Research Outputs, Research staff, Organisational units, External collaborations, and more. It is achieved by integrating Pure with local systems while also capturing data by work processes that ensure quality and completeness.

This makes Pure a single authoritative source of quality-assured information about an institution’s research affairs. Information is available at the desired level of granularity in real-time.

This is the main role which is envisaged locally:

Jane Millar, Pro-Vice-Chancellor for Research, and Project Sponsor, said that the implementation of Pure was a major step forward in how we handle information about our research here at the University: “In an increasingly competitive environment it is essential that we have up to date and accurate information about the excellent research undertaken here at the University. Pure will provide this data and help us to comply with external reporting requirements such as REF2014“.

However the Pure web site goes on to add that “Pure is also a tool for researchersPIs, and departmental managers – clear and recognised value is provided for these users, which furthers user-acceptance and -uptake; mission-critical factors in any CRIS project“.

This is, however, were I have my reservation. Although the ability to publish a CV of one’s research publications is provided by Pure and, as illustrated, I can currently create a CV in PDF or MS Word formats,  this functionality does not appear to provide the social function in establishing connections with one’s peers that is provided in services such as LinkedIn or Academia. edu. In addition, it seems unlikely that a researcher profiling service which is co-located on the same institutional domain as the institutional repository will provide the ‘Google juice’ to one’s research papers which LinkedIn and appear to pr0vide. It should also be noted that, as described in posts on What I Like and Don’t Like About IamResearcher.comThoughts on Google Scholar Citations and Will the Real Scott Wilson Please Stand Up, Please Stand Up services such as LinkedIn, Academia,edu and IamResearcher appear to provide richer interfaces and visualisations than Pure provides.


It would, however, be inappropriate to criticize Pure for not providing the same quality of visualisation of one’s co-author network as, Microsoft Academic Search, for example, does.

Microsoft Academic Search’s visualisation of my co-authors is illustrated. However Microsoft Academic Search also thinks I am an expert in Psychiatry and Psychology! The service has confused me with B D Kelly who is an expert in these areas and, despite updating my profile, I have been unable to decouple my research publications from B D Kelly’s.

Pure aims to provide  an authoritative list of research publications by researchers within the institution which will be needed to support institutional reporting requirements.

However an individual researcher may have different requirements –  and if a key aim is to enhance access to one’s research papers I am still convinced that use of social media services such as LinkedIn and will provide benefits which aren’t provided by a Current Research Information System.

Posted in Repositories | Tagged: , | 1 Comment »

Making An Impression; Making Connections

Posted by Brian Kelly on 12 Jul 2012

Social Media: For Ourselves and For Our Customers

A recent post entitled IWMW 2012: The Feedback summarised the feedback we had received for the recent IWMW 2012 event. In addition to this summary more detailed information was sent to the individual speakers and workshop facilitators on their talks and workshop sessions. Such feedback can be valuable in either showing the value of the contribution made at the event or providing suggestions on how the talk could be improved in repeated in future.

We published the feedback two weeks after the event as it is important that such information is available while the event is fresh in people’s memories. But, of course, there can be other ways of getting feedback. At the UCISA User Support Services Conference which took place a few day’s ago at the impressive Crewe Hall Hotel I was pleased to receive feedback on Twitter on the talk I gave on “Social Media: For Ourselves and For Our Customers” which have been summarised on Storify. The feedback included:

  • Excellent presentation, you gave me a lot of new ideas for how I can communicate with my staff and customers. Thanks!
  • Brilliant presentation from @briankelly – good to have a push to tweet a bit more!
  • Brilliant talk from @briankelly – typically informative, insightful, and full of #lolz…
  • Also really enjoyed @briankelly talk about social media. Engaging. Had a chuckle. And I think he likes a real ale so is in my good books

together with an example of an action taken as a result of the talk:

  • Inspired to send my first tweet

Beyond the tweets, a post entitled What a difference a day makes published on the Musings from the frontline blog described how

Today we sat and listened to people who had not only aspired to do things differently and better but, most importantly, had achieved it.

and went on to conclude:

So, thank you @heloukee@maffrigby@briankelly and #ussc12 for the inspiration. You have provided the relationship counselling that I needed and me and conferences are now blissfully happy together again (for now anyway…

It’s About Links; It’s About Connectedness!

The topic of my talk was the importance of social networks to facilitate more effective collaborative working by making use of the existing social networking infrastructure. Although this is a subject I have spoken about previously, as described recently in a post on It’s About Links; It’s About Connectedness! I was fortunate to see Cameron Neylon’s opening plenary talk at the Open Repositories 2012 conference. As described in the live blog of the closing session for the conference given by Peter Burnhill:

we need to think about connectivity, as flagged by Cameron. And these places ie Twitter and Facebook… We don’t own them but we need to be I them, to make sure that citations come back to us from here.

The importance of use of such social media services to provide links to papers hosted in open repositories was also highlighted by Peter Burnhill in his observation that:

And there was talk of citation… LinkedIn, etc. is all about linking back to research to data

It was pleasing to see that the ideas described in a paper by myself and Jenny Delasalle which asked “Can LinkedIn and Enhance Access to Open Repositories?” had been highlighted in the conference conclusions. But these particular ideas were just a simple example of the bigger picture provide by Cameron Neylon on the importance of networks which, on a global scale, can enable researchers to address difficult research topics which cannot be achieved by the single researcher or research group.

The Video For Connecting, For Sharing

Cameron’s talk, which is available on YouTube and embedded below, makes the point about the importance of connectivity (the social web) and ease-of-use (the lack of ‘friction’ needed to embed social web tools in workflow practices) very eloquently and is well worth viewing (and I’d like to give my thanks to the OR 12 organisers for publishing this video recording so quickly – and also for making it available on YouTube so it can be embedded in this blog).

It would, however, be a mistake to regard social networks as being purely a tool for scientific researchers – just as some people mistakenly feel that social networks are just for young people or for purely ‘social’ purposes a confusion caused by the different meanings of the term ‘social’. As I described in my talk, for which a video recording is also available, social networks can also be valuable for those working in support services – and institutions should gain benefits in use of social networking services across teaching and learning, research, marketing and support areas if they are regarded as valuable tools rather than treated with suspicion as is current the case in some areas.

Another important point made by Cameron is the importance of openness for both facilitating connections and minimising the friction caused by licensing barriers. The videos of Cameron’s talk and my talk provide another example of the ways in which connections can be made and knowledge and ideas shared by facilitating access to videos of talks at conferences. As I have described in previous talks on amplified events, such approaches can help the ideas shared at conferences escape the constraints of space and time. Many thanks to the OR 2012 and UCISA conference organisers for providing the live videos streams (escaping the constraints of space) and providing rapid access with little access barriers to the recordings of the talks (escaping the constraints of time).  Long may this continue – and if you are considering organising an amplified event the recent “Event Amplification Report” may be of interest.

Twitter conversation via Topsy: [View]

Posted in Repositories, Web2.0 | Tagged: | 1 Comment »

Open Metrics for Open Repositories

Posted by Brian Kelly on 10 Jul 2012

Later today Nick Sheppard will present a paper entitled “Open Metrics for Open Repositories” at the Open Repositories 2012 conference.

This paper, which was written my myself, Nick, Jenny Delasalle, Mark Dewey, Owen Stephens, Gareth Johnson and Stephanie Taylor, describes the importance of metrics for institutional repositories for a number of stakeholders, including funders at a national level, developers of services which may aggregate repository content, librarians and research support unit within institutions as well as the individual researchers and their departmental colleagues.

In light of the diverse requirements for metrics across these stakeholder communities, in the paper we argue that such metrics should be provided as open data. This would appear to be particularly relevant in the context of open repositories – we are aware of the tensions regarding open access to research publications due to the complexities of quality assurance processes and business models for funding peer-reviewing, but such considerations should not act as a barrier for providing access to the variety of usage statistics and related data associated with repositories.

Our paper is available from the University of Bath institutional repository. In addition, as Nick has described in a post on the Repository News blog at Leeds Metropolitan University, Nick will be presenting the paper later today. The presentation will be given in the RF1: Pecha Kucha – Repository Tools and Approaches which starts at 15.30 today Tuesday 10 July. Note that the Twitter hashtag for the conference is #or2012 – so follow this tag in your Twitter client at around this time to follow the discussion about the paper.  The slides Nick will use in the presentation are available on the Slideshare repository and embedded below.

Twitter conversation from Topsy: [View]

Posted in Repositories | 3 Comments »

Paper Accepted for OR12: Can LinkedIn and Enhance Access to Open Repositories?

Posted by Brian Kelly on 3 Jul 2012

I’m pleased to say that a paper by myself and Jenny Delasalle, Academic Services Manager (Research) at the University of Warwick, which asked “Can LinkedIn and Enhance Access to Open Repositories?” has been accepted for the Open Repositories conference, OR 2012.

This paper, which is available from the University of Bath institutional repository, is based on work initially published on this blog.

A blog post entitled “How Researchers Can Use Inbound Linking Strategies to Enhance Access to Their Papers” published on 2 March 2012 described an Inbound linking strategy to get to the top listing on google fast. It occurred to me that my willingness to make use of researcher profiling services such as, ResearcherID, Scopus, Researchergate, Mendeley, Microsoft Academic Search and Google Scholar Citations may have helped to enhance the visibility of my research papers which are hosted in the University of Bath repository. The blog post went on to describe how I found that I was author of 15 of the most downloaded papers in the repository from my department.

More recent investigations reveal that, as illustrated, I have the largest number of downloads of any author at the University of Bath! This was recently brought to the attention of the PVC for Research who, in a departmental meeting, informed me that a University of Bath Research Group had discussed these figures and asked me to share the approaches with other researchers at Bath. In response I mentioned that the approaches I’d taken, the evidence I’d gathered, the hypothesis I had proposed for explaining the evidence, possible alternative hypotheses, the limitations of the approaches, the implications of the findings and areas for further work had been submitted to the Open Repositories 2012 conference – and if the paper was accepted the findings would be available to all, and not just researchers at my host institution.

The paper explores other possible reasons for the high visibility of these papers – and one possibility worthy of further investigation is the provision of many papers in HTML formats and not just PDF and MS Word. However the use of popular researcher profiling services such as LinkedIn and are felt to be worth recommending to researchers in order (a) to ensure that their research papers can be more easily found by their peers on these services and (b) so that links to the paper on their institutional repository can enhance the visibility to Google of the papers as well as enhancing the Google ranking of the repository itself.

Of course it probably needs to be said that that the number of downloads is not necessarily an indicator of quality. However the converse is also true: just because a paper in a repository is seldom viewed does not indicate that it must be a great paper! I am quite happy to promote the use of such approaches since increased numbers of views, especially for the target communities, can help to both embed the ideas given in the papers by practitioners and increase the likelihood that the papers will be cited by other researchers. In my case I’m pleased that, according to Google Scholar Citations, my most cited papers have been cited 87, 67, 54 and 40 times.

My co-author Jenny Delasalle has been investigating use of researcher profiling service at the University of Warwick, her host institution. It was interesting that in Jenny’s research she found that a number of commercial publishers encourage their authors to use services such as LinkedIn and to link to their papers hosted behind the publishers paywalls – and yet we are not seeing institutional views of the benefits of coordinated use of such services by their researchers. Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers’ work in institutional repositories.

Surely it is time for the research community to develop inbound linking strategies to their research work, especially as this can be done so simply. Indeed the OR12 conference organisers have invited us to summarise the ideas described in a poster and a one-minute presentation. The ideas have been summarised using the Pixton cartoon generation tool in four strips.

[link to source]
[link to source]
[link to source]
[link to source]

I’m not sure if it will be possible to use PowerPoint during the one-minute madness but I have prepared some slides which are available on Slideshare and embedded below.

NOTE: A one minute summary of this paper was given on the opening day of the OR 12 conference. A video recording of the summary is available on Vimeo and embedded below.

Also note that a slightly modified version of this post was published on the LSE Impact of Social Sciences blog on Thursday 23 August 2012. You can also view the statistics for access to the post via the URL.

Twitter conversation from Topsy: [View]

Posted in Evidence, Repositories, Web2.0 | 7 Comments »

Profiling Staff and Researcher Use of Cloud Services Across Russell Group Universities

Posted by Brian Kelly on 5 Mar 2012

Personal Benefits of Maximising Inbound Links to Research Papers

A recent post on this blog which described How Researchers Can Use Inbound Linking Strategies to Enhance Access to Their Papers reviewed personal experiences of the benefits of making use of third party services to provide inbound links to research publications.

In the post I suggested that the large numbers of downloads of my papers from the University of Bath institutional repository may be due to the enhanced Google juice provided by having links to my papers from such services. The purpose of the post was to suggest that researchers may benefit from increased access to their research publications if they are pro-active in using such services. My speculations may, of course, be incorrect; the downloads may be due to the quality of the papers rather than the numbers of in-bound links, for example :-). In addition, downloads themselves are not, of course an indication of quality. However since the papers, which have been through some form of peer-reviewing, will not have any influence if they are never read, I am happy to regard such approaches as helping to enhance the numbers of people reading the papers which may, or may not, lead to some form of subsequent ‘impact’.  Note that the slideshow on “Metrics: The New Black?” by Kristen Fisher Ratan which are available on Slideshare explores such considerations in more detail.

Profiling Institutional Use of Such Services

I recently came across the Libresearch blog which is provided by Jenny DelaSalle who, on her @JennyDelasalle Twitter profile describes herself as a “Research support Librarian: interested in bibliometrics, copyright, scholarly communications, and all sorts!”  I read her posts on topics including Webometrics and altmetrics: digital world measurementsWarwick people on external profile sites and 1,670 Warwick people on In the latter two posts she documented evidence of take-up of a number of third party services by researchers at the University of Warwick. Her post included a reference to one of my posts which profiled Russell Group university use of Google Scholar Citations. I am now able to build on Jenny’s work by using some of the survey methodology techniques she has helpfully documented in her blog to document evidence of take-up across the twenty Russell group university of popular third party service which provide links to research publications.

Having read the post on Warwick people on external profile sites it occurred to me that such institutional profiling work would benefit from being seen in a wider context. I therefore used the methodologies documented by Jenny in her blog post to gather similar information across the twenty Russell Group universities.

The findings are given in the following table. Note that the data for the Academia, LinkedIn and ResearcherID was collected on 1 March 2012 and the data for Google Scholar Citations on 3 March 2012.

Ref. No. Institution  Academia LinkedIn LinkedIn ResearcherID Google Scholar
(Followers) (Current)
1 University of Birmingham   1,473   4,161   2,855      77 77
2 University of Bristol   1,603   3,687   3,167     231 55
3 University of Cambridge   5,287   7,371   6,919     400 83
4 Cardiff University   1,456   3,558   3,087     442 38
5 University of Edinburgh   3,341   5,947   5,536     241 75
6 University of Glasgow   1,572   3,147   3,646       27  70
7 Imperial College   1,383   7,615   6,306     399  78
8 King’s College London   2,182   5,078       25      64  35
9 University of Leeds   2,706    5,251   5,954    198  39
10 University of Liverpool   1,292   3,325   4,330    148  26
11 London School of Economics   1,909    6,907   1,914      36  37
12 University of Manchester   3,603    6,517   7,425     278  74
13 Newcastle University   1,509    3,583   3,001     173   94
14 University of Nottingham   2,022    5,107   6,010     315   52
15 University of Oxford   6,723    7,771   8,751     346 128
16 Queen’s University Belfast   1,100    1,978   5   1,989       88   24
17 University of Sheffield   1,701    4,171   5,269      255   36
18 University of Southampton   1,738    4,176   4,642      255   52
19 University College London   4,587    9,034   6,334      673  160
20 University of Warwick   1,770    3,667   2,855     199    34
TOTAL 48,957 102,051  88,03190,015  5,599 1,267


As described in an article on Using LinkedIn For SEO:

Your profile can be an excellent source of SEO friendly links because:

    • LinkedIn has great authority in Google
    • Your website links can be given unique anchor text with the dofollow attribute
    • Your LinkedIn profile can have highly relevant content relative to the websites you own

It might be reasonable to assuming the use of the LinkedIn service comes mainly from staff and research students. In light of the popularity of the service might be find that encouraging researchers to provide links to copies of their papers hosted in their institutional repository will provide benefits not only for the individual researcher, but for the repository service itself, though the increased numbers of inbound links?

The DirectionsSEO site provides information on 5 Inbound Link Analysis Tools which may help to provide evidence of the value of inbound links. Initial experimentation with the service suggests, however, that has the highest SEO ranking of domains linking to the University of Bath Opus repository service. But before concluding that researchers should be blogging about their research publications on the platform  I’d welcome feedback on the suggestion that the next stage for maximising access to research publication should be based on inbound linking strategies rather than further developments to institutional services.

Paradata:   As described in  a post on Paradata for Online Surveys blog posts which contain live links to data will include a summary of the survey environment in order to help ensure that survey findings are reproducible, with information on potentially misleading information being highlighted.

The data for the AcademiaLinkedIn and ResearcherID was collected on 1 March 2012 and the data for Google Scholar Citations on 3 March 2012.

The values for Google Scholar Citation for the universities of Birmingham and Newcastle include ‘UK’ in the search field in order to avoid including information from US and Australian universities with the same name.

It should also be noted that I was logged into the services when I gathered the information.

It should also be noted that the low values for LinkedIn followers for King’s College London and Queen’s University Belfast are felt to be due to the apostrophe used in the institution’s names. For example of search (carried out on 6 March 2012) on LinkedIn for King’s College London gives 3,418 hits but a search for Kings College London gives 294 hits.

Posted in Evidence, Repositories | 6 Comments »

How Researchers Can Use Inbound Linking Strategies to Enhance Access to Their Papers

Posted by Brian Kelly on 2 Mar 2012

The Value of Inbound Links to Resources

Via Smartr, the iPod Touch app I use to read articles which have been posted by Twitter followers, this morning I came across a link provided by a tweet which described an Inbound linking strategy to get to the top listing on google fast. The post described how the author, a web manager at Florida International University:

… developed  a strategy I would make inbound links to the FIU President’s Council site from places I can control a few of these places include FIU News,Alumni AssociationFIU A to Z index, blogs that have comments open, etc.  and on all those I make links using the words FIU President’s Council that link directly to the sites homepage.

The importance of providing links to a resource in order to maximise access to the resource is well understood – particularly, it seems, by spammers.  But how could such well-established techniques be used in an ethical way by researchers?

The answer, it seems to me, is quite simple. Researchers do have access to a wide range of web services which can legitimately provide links to their research publications.   This is an approach I have been using for several years. A summary of the numbers of publications which are listed in the services I use is given in the following Table.

Service My Account Summary
Microsoft Academic Search My details 39*
Google Scholar Citations My details  82
Researcher ID My details 10
Scopus My details  23 My details  50
Researchgate My details 110
Mendeley My details  23

*  The Microsoft Academic Search automatically includes papers from people with the same name.  These need to be manually excluded and there is a delay before updates are validated.  The service currently lists 286 papers, including many from medical researchers of the same name.  However only 39 papers have been claimed as authored by me.

It should also be noted that a number of the services provide links to the research papers (which in my case and normally hosted on the University of Bath institutional repository) although other services only provide the metadata.

Evidence of Enhanced Access

There is a cost to registering for such services and uploading details of one’s papers. However in practice I have found that it does not take a significant amount of time to upload relevant information and the services can provide useful information, such as helping to visualise one’s professional network and, as illustrated (taken from Mendeley) growth in  the number of citations, downloads, followers, etc.

But although individual  may or may not find such information of interest or value, there remains a question as to whether there is any tangible evidence of growth in downloads due to a policy of enhancing the numbers of links to such resources.

A possible answer to that question may be found form an analysis of the download statistics for items stored on Opus, the University of Bath institutional repository.

In order to make comparisons an image is shown of the top 20 most downloaded items provided by staff at UKOLN.

From this list we can see that I am a co-author of 15 of the top 20 items.

There may be several explanations for this:

Quality of the papers: Although two of my papers are the highest ranked papers which have been published at the W4A conference series I am quite happy to say that I am convinced that my colleagues have produced papers of much greater research value.

Social media optimisation: The paper on  Library 2.0: balancing the risks and benefits to maximise the dividends is the second most downloaded single paper from the University of Bath repository. The popularity of this paper was due to the large numbers of downloads shortly after the availability of the paper had been announced on this blog.  Although I am convinced that use of social media can also enhance access to peer-reviewed papers, several of the other popular papers in the above list were published between 2004 and 2007, before Twitter and before I was making significant use of the blog.

To conclude, I believe that adding information about one’s research publications to services such as, ResearchGate, Microsoft Academic Search and Google Scholar citations can increase the visibility of the papers to Google, as well as to users of the services, which may then lead to increased numbers of downloads, citations and take-up of the ideas described in the papers.

Do you agree?

Posted in Evidence, Papers, Repositories | 8 Comments »