UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

MajesticSEO Analysis of Russell Group University Repositories

Posted by Brian Kelly on 29 Aug 2012

Investigation of SEO Rankings of Institutional Repositories

There is a need “to investigate whether links [from popular social media services] are responsible for enhancing SEO rankings of resources hosted in institutional repositories” concluded the paper by myself and Jenny Delasalle which asked “Can LinkedIn and Academia.edu Enhance Access to Open Repositories?“.

The importance of SEO rankings for surfacing content hosted in institutional repositories can be gauged from the responses to the query I asked on the JISC-Repositories JISCMail list: “Does anyone have any statistics on the proportion of traffic which arrives at institutional repositories from Google?”. I asked a similar question on Twitter and found that mature research repositories seem to get about from 50-80% of their traffic from Google. This aligns with the findings reported by Les Carr for the University of Southampton back in 2006: “the majority of repository use, if I can equate eprint downloads with repository use, is due to external web search engines (64%)“. Indeed since it has been reported that direct downloads of PDFs hosted in repositories may not be reported unless Google Analytics has been configured appropriately such figures may be an underestimate!

In light of the importance of Google in supporting repositories in their mission of making research papers easily accessible to others it will be useful to gain a better understanding of the factors which contribute to supporting the discoverability of the content hosted in institutional repositories.

The survey described in this post reports on summary SEO findings for the 24 Russell Group universities. The aims of the survey are to provide a benchmark for comparisons with surveys which may be carried out in the future, to attempt to identify any interesting usage patterns which may help to enhance the effectiveness of institutional repositories and to identify the highest ranked domains which provide links to institutional repositories.

Survey Using MajesticSEO

The data was collected on 27-28 August 2012 using the MajesticSEO service. Note that the current finding can be obtained by following the link in the final column. The findings can be viewed if you have signed up to the free service.

Table 1: MajesticSEO Findings for Repositories Hosted at Russell Group Universities
Ref.
No.
Institutional Repository Details Referring
Domains
External
Backlinks
Educational
Backlinks
Educational
Domains
Top Five Domains & Numbers of Links View Results
1
Repository usedeprint Repository
 116  499  146  16 blogspot.com: 6,424
wordpress.com: 4,658
wikipedia.com: 200
bbc.co.uk: 82
sourceforge.net: 67
[Link]
2
InstitutionUniversity of Bristol
Repository used: ROSE
 159  691 144  21 wordpress.com: 7,871
blogspot.com: 6,692
wikipedia.org: 273
bbc.co.uk: 98
guardian.co.uk: 89
 [Link]
3
Repository usedDspace @ Cambridge
  86 7,339  283  97 blogspot.com: 33,276
wordpress.com: 17,241
wikipedia.org: 1,771
google.com: 449
bbc.co.uk: 442
 [Link]
4
InstitutionCardiff University
Repository usedORCA
   22     58     9    4 wordpress.com: 1,874
blogspot.com: 883
typepad.com: 250
bbc.co.uk: 85
guardian.com: 60
 [Link]
5
InstitutionUniversity of Durham

Repository usedDRO

297 1,281   27   12 wordpress.com: 5,430
blogspot.com: 3,020
wikipedia.org: 145
bbc.co.uk: 76
ask.com: 45
 [Link]
6
Repository used: ERA
747  3,943  247  71 blogspot.com: 14,380
wordpress.com: 9,845
wikipedia.org: 470
google.com: 401
bbc.co.uk: 296
[Link]
7
InstitutionUniversity of Exeter
Repository used: ERIC
Note: Repository sub-domain not used. See footnote 2.
198   958  175   18 wordpress.com: 1,125
blogspot.com: 1,115
bbc.co.uk: 45
wikipedia.org: 43
ning.com: 42
 [Link]
8
InstitutionUniversity of Glasgow
Repository usedEnlighten
680 
 4,868 423  62 blogspot.com: 5,880
wordpress.com: 5,087
wikipedia.org: 322
bbc.co.uk: 178
cnn.com: 135
 [Link]
9
InstitutionImperial College
Repository usedSpiral
 139  702 329  11 wordpress.com: 3,363
blogspot.com: 1,883
bbc.co.uk: 121
guardian.co.uk: 119
wikipedia.org: 65
[Link]
10
   14
 37 blogspot.com: 2,552
wordpress.com: 2,275
bbc.co.uk: 169
wikipedia.org: 160
reddit.com: 139
[Link]
11
InstitutionUniversity of Leeds
Repository usedWhite Rose Research Online
 700 4,847 1,354    2 blogspot.com: 44
wordpress.com: 23
wikipedia.org: 13
google.com: 8
ox.ac.uk: 5
 [Link]
12
Repository usedResearch Archive
   66 
 297   147    8 blogspot.com: 4,057
wordpress.com: 2,461
wikipedia.org: 97
bbc.co.uk: 55
google.com: 53
[Link]
13
InstitutionLSE
Repository usedLSE Research Online
 1,365 9,771  549   80 wordpress.com: 14,449
blogspot.com: 11,550
wikipedia.org: 343
google.com: 262
flickr.com: 244
[Link]
14

Repository usedeScholar

Note: Repository sub-domain not used. See footnote 3.
 (5)  (29)  – [Link]
15
InstitutionNewcastle University

Repository usedNewcastle Eprints

 30  215  85    5 blogspot.com: 6,425
wordpress.com: 3,929
wikipedia.org: 221
bbc.co.uk: 116
ask.com: 87
[Link]
16
Repository usedNottingham Eprints
 359 1,594 328   57 blogspot.com: 5,410
wordpress.com: 3,856
wikipedia.org: 148
google.com: 77
guardian.co.uk: 66
[Link]
17
InstitutionUniversity of Oxford
Repository usedORA
 299  1,116  94  35 blogspot.com: 42,008
wordpress.com: 39,798
wikipedia.org:
1,437
ask.com: 548
bbc.co.uk: 504
[Link]
18
Repository used: QMRO
  27  449  350   6 wordpress.com: 4,722
blogspot.com: 1,221
orange.fr: 259
wikipedia.org: 219
ask.com: 89
 [Link]
19

Note
: Repository sub-domain not used. See footnote 4.
 (9)  (14)  –  – [Link]
20
Repository used: DCS Publications Archive

Note: Repository sub-domain not used. See footnote 5.

Note: The University of Sheffield also uses the White Rose repository which is also used by Leeds and York. See the Leeds entry for the statistics.

 (2)   (3)  –  –  [Link]
21
Repository usedeprints.soton
1,329
46,176 33,524 123 blogspot.com: 4,384
wordpress.com: 2,568
wikipedia.org: 264
bbc.co.uk: 138
microsoft.com: 89
[Link]
22
Repository usedUCL Discovery
  335
 13,978 492   24 blogspot.com: 16,009
wordpress.com: 15,633
wikipedia.org: 860
bbc.co.uk: 406
youtube.com: 250
 [Link]
23
InstitutionUniversity of Warwick

Repository usedWRAP

   433
 2,476 278    20 blogspot.com: 9,412
wordpress.com: 7,601
google.com: 217
wikipedia.org: 179
reddit.com: 122
[Link]
24
InstitutionUniversity of York
Repository used: YODL
Note: Repository sub-domain not used. See footnote 6.
Note: The University of Sheffield also uses the White Rose repository which is also used by Leeds and York. See the Leeds entry for the statistics.
 (3)  (5)  –  –  [Link]
Range  14 – 1,369  37 – 46,176  9 – 33,524  2 – 123

NOTE:

  1. The list of repositories is taken from OpenDoar.
  2. The ERIC repository at the University of Exeter is hosted at https://eric.exeter.ac.uk/repository/ Since the repository home page is a redirect from https://eric.exeter.ac.uk/ it was possible to analyse the SEO rankings and get appropriate results.
  3. The eScholar repository at the University of Manchester is hosted at http://www.manchester.ac.uk/escholar/  Figures for this home page are given but since the domains with incoming links may refer to pages hosted on the manchester.ac.uk domain, these figures are not given in order to avoid skewing the findings.
  4. The Queen’s University Belfast repository is hosted at http://www.qub.ac.uk/schools/SchoolofPoliticsInternationalStudiesandPhilosophy/Research/PaperSeries/ConWEBPapers/ Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the qub.ac.uk domain, these figures are not given in order to avoid skewing the findings.
  5. The DCS repository at the University of Sheffield is hosted at http://www.shef.ac.uk/dcs/research/publications Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the shef.ac.uk domain, these figures are not given in order to avoid skewing the findings.
  6. The YODL repository of the University of York is hosted at http://dlib.york.ac.uk/yodl/app/home/index Figures which are available for this home page are given but since the domains with incoming links may refer to pages hosted on the dlib.york.ac.uk domain, these figures are not given in order to avoid skewing the findings.

Table 2 gives the total number of links to the high-ranking domains which are listed in the survey, together with the Alexa ranking for these domains. Note Google.com has the highest Alexa ranking and is listed at number 1. Figure 1 shows the significance of links from blog platforms compared with the other most highly-ranked domains.

Figure 1: Histogram of number of incoming links from top domains

Table 2: Nos. of Links from High-Ranking Domains
No. Domains No. of links Alexa Ranking
1 Blogspot  176,625       5
2 WordPress  153,809     21
3 Wikipedia     7,230       8
4 BBC     2,811     36
5 Google    1,447       1
6 Ask       769     46
7 YouTube       460       3
8 Guardian       334    187
9 Reddit       261    143
10 Orange.fr       259    259
11 Typepad       250   212
12 CNN      135     43
13 Microsoft       89     26
14 Sourceforge       67    139
15 Ning       42    256
16 Oxford University         5 6,764

Discussion

In a previous post I suggested that since LinkedIn.com is so widely used across Russell Group Universities, encouraging researchers to provide links to their papers hosted in their institutional repository would enhance the visibility of papers to Google, especially since LinkedIn has such a high Alexa ranking (it currently is listed at number 13 in the global ranking order).

However it appears that LinkedIn does not appear to have a significant presence according to the findings provided in MajesticSEO (although the free version does only list the top five domains).

Based on the information obtained in the survey it would appear that two blog platforms, WordPress.com and Blogspot.com, are primarily responsible for driving traffic to institutional repositories, having both high Alexa rankings together with large numbers of links to the repositories.

Following these two platforms, but a long way behind, we find Wikipedia and the BBC and then, perhaps somewhat confusingly, Google itself (perhaps links from Google Scholar). The presence of media sites such as the BBC, CNN and the Guardian suggest that researchers (or their media advisers) are doing a good job in ensuring that these organisations provide links to original research papers when stories about university research are being covered in the media.

But perhaps the most noticeable findings is that only one University Web site – Oxford’s – is included in the list of the top 5 domains across all of the Russell Group Universities. The low Alexa ranking (6,764) for the Oxford University Web site in comparison with the other sites listed (which have an Alexa ranking ranging from 1 to 259) suggests that links from university Web sites, even prestigious universities such as Oxford, will not have a significant impact on Google search results. It should also be noted that links from the University of Oxford Web site will not provide SEO benefits to the University of Oxford’s repository, which is hosted in the same domain (ox.ac.uk).

Limitations of this Survey

It should be noted that these conclusions are based on just one SEO tool and only a small selection of the findings are available. A more comprehensive survey would make use of the licensed version of the service, and make use of other SEO tools to compare the findings.

In addition Google do not publish the algorithms on which their search results are ranked so there can be no guarantee that the findings provided by SEO tools will relate directly to users experiences of using Google.

In order to relate these findings to the ways users access resources hosted on a repository there will be a need to examine usage statistics for repositories. It would be interesting to see if the downloads for the most popular items show any correlation with links from the services listed above.

Survey Paradata: The findings given in Table 1 were collected on 27-28 August 2012 using the free version of MajesticSEO. The Alexa rankings listed in Table 2 were obtained from the Alexa survey and collected on 28 August 2012. Where the findings from MajesticSEO were incomplete, due to the repository not being hosted on the root of a repository sub-domain this information was recorded and any data collected was not included in further analysis.


Twitter conversation from: [Topsy] – [SocialMention] – [WhosTalkin]

15 Responses to “MajesticSEO Analysis of Russell Group University Repositories”

  1. Dixon Jones said

    Hi Brian, I am a director here at MajesticSEO. Thanks for using our data for this research. If we can help with a follow up then I would be very happy to help. Track me down @Dixon_Jones on Twitter. In particular, Looking at the comparative Trust Flow metrics would be an interesting comparison, since trust is a weighted metrics that passes through multiple link iterations, meaning that large numbers of low quality links are less influential on the resulting 0-100 scores. By contrast, Citation Flow is a “Link heavy” metric… so urls or sites with high trust and low citation flow MIGHT indicate narrow centres of excellence.

    • Thanks for the comment. I’m now following you on Twitter (I’m @briankelly).

      Funnily enough initially I did include the Trust Flow and Citation Flow values on the table. However since the FAQ didn’t really provide a great explanation of what these terms mean and how they are obtained, I decided to omit this information so that the post was focussed on the figures which are easily understood,

      On further reflection I wonder if significant number of links to the repositories are from link farms which are hosted on Blogspot or WordPress. I guess it would be useful to be able to filter out links which are felt to be from untrustworthy sources. Is that possible?

  2. dsalo said

    I don’t really know how to intepret these numbers! Are they correlated with amount of content, amount of full-text (or other non-metadata-only) content, breadth or depth of subject matter, what?

    • Hi Dorothea

      I’m working with a small number of repository managers who will be in a position to provide such contextual information. I prefer working in an open fashion, providing evidence as the work progresses, so that flaws in the methodology can be spotted at an early stage.

  3. […] Investigation of SEO Rankings of Institutional Repositories There is a need “to investigate whether links [from popular social media services] are responsible for enhancing SEO rankings of re…  […]

  4. […] Not about OERs but similar questions are of interest.   Investigation of SEO Rankings of Institutional Repositories There is a need “to investigate whether links [from popular social media services] are responsible for enhancing SEO rankings of re…   "The survey reports that two blogging platforms appear to be primarily responsible for providing the high ranking which may drive traffic to repositories."   The two blogging platforms in question are WordPress.com and Blogger. One question, I suppose, is how much traffic would the publications get if they were exposed directy through these platforms compared to being in the repository.      […]

  5. […] Traffic from Google searches varies from repository to repository but ranges between 50-80% are not uncommon [Ref 2] […]

  6. Ann said

    thanks

  7. […] what patterns of usage for searches for university Web sites do we find? In a recent survey of the search engine rankings, it was observed that only one institutional Web site (at the University of Oxford) was featured in […]

  8. […] post published in August 2012 on an MajesticSEO Analysis of Russell Group University Repositories highlighted the importance of search engine optimisation (SEO) for enhancing access to research […]

  9. […] SEO Analysis of WRAP… on MajesticSEO Analysis of Russel… […]

  10. […] SEO Analysis of LSE … on MajesticSEO Analysis of Russel… […]

  11. […] SEO Analysis of LSE … on MajesticSEO Analysis of Russel… […]

  12. […] SEO Analysis of Enli… on MajesticSEO Analysis of Russel… […]

  13. It would be very useful, to benchmark results and into analyse new link flow metrics that were added from that point.
    This would give another dimension on the outcome and give a little bit more waiting to what link metrics added value?

Leave a comment