UK Web Focus

Innovation and best practices for the Web

Archive for October, 2012

“Making Sense of the Future” – A Talk at #ILI2012

Posted by Brian Kelly on 30 October 2012

Later today I’ll be giving a talk entitled “Making Sense of the Future” at the ILI 2012 (Internet Librarian International) conference which takes place in Olympia, London.

The talk is based on the work of the JISC Observatory and a paper entitled “What Next for Libraries? Making Sense of the Future” (available in PDF and MS Word formats)” which was presented recently at the EMTCAL12 (Emerging Technologies in Academic Libraries) conference held in Trondheim, Norway.

The talk highlights dangers that our expectations of future developments might be based on views of the importance of our profession. In reality technological developments may challenge the profession, as those work work in the music industry are aware. We therefore need to have an evidence-based approach for detecting ‘weak signals’ of developments, and complement this with an open discussion for validating the evidence-gathering methodologies, interpreting the implications of such signals and making plans for appropriate actions.

The slides are available on Slideshare and embedded below. In addition if you’d prefer a visual summary of the presentation, the context is provided above and the conclusions below.

Posted in Events | 2 Comments »

Open Practices for Open Repositories

Posted by Brian Kelly on 29 October 2012

Background

Open Access Week, which took place last week, was a busy period for me. Not only did I give talks on how social media can enhance access to research papers hosted in institutional repositories at the universities of Exeter, Salford and Bath, I also wrote accompanying posts which were published on the Networked Researcher and JISC blogs. But perhaps more importantly last week I coordinated the publication of three guest posts on this blog: SEO Analysis of WRAP, the Warwick University RepositorySEO Analysis of LSE Research Online and SEO Analysis of Enlighten, the University of Glasgow Institutional Repository.

Sharing of Repository Practices and Experiences

The background to this work were the two papers I co-authored for the Open Repositories OR 2012 conference. In the paper on “Open Metrics for Open Repositories” (available in PDF and MS Word formats) myself, Nick Sheppard, Jenny Delasalle, Mark Dewey, Owen Stephens,Gareth Johnson and Stephanie Taylor conclude with a call for repository managers, developers and policy makers to be pro-active in providing open access to metrics for open repositories. In the paper which asked “Can LinkedIn and Academia.edu Enhance Access to Open Repositories?“, also available in PDF and MS Word formats, Jenny Delasalle and myself described how popular social media services which are widely use by researchers can have a role to play in  enhancing the visibility of papers hosted in repositories. However although LinkedIn and Academia.edu appeared to be widely used we  concluded by described how “further work is planned to investigate whether such links are responsible for enhancing SEO rankings of resources hosted in institutional repositories“.

This work began with a post which described the findings of a MajesticSEO Analysis of Russell Group University Repositories. This post made use of the MajesticSEO service which can report on SEO ranking factors for Web sites. The survey provided initial findings of a survey of institutional repositories hosted by the 24 Russell Group universities.

This initial post was intended to explore the capabilities of the tool and gauge the level of interest in further work.  In response to the post the question was asked “Are [the findings] correlated with amount of content, amount of full-text (or other non-metadata-only) content, breadth or depth of subject matter, what?” These were valid questions and were addressed in the more detailed follow-up surveys, which were provided by repository managers at the universities of Warwick, Glasgow and LSE who have the contextual knowledge needed to provide answers to such questions.

In this initial series of guest blog posts, William Nixon concluded with the remarks:

This has been an interesting, challenging and thought-provoking exercise with the opportunity to look at the results and experiences of Warwick and the LSE who, like us reflect the use of Google Analytics to provide measures of traffic and usage.

The overall results from this work provide some interesting counterpoints and data to the results which we get from both Google Analytics and IRStats. These will need further analysis as we explore how Majestic SEO could be part of the repository altmetrics toolbox and how we can leverage its data to enhance access our research.

I feel the exercise has been valuable for the three contributors. But I also feel that the descriptions of the experiences in using the MajesticSEO tool, the findings and the interpretation of the findings in an open fashion will be of valuable to the wider repository community, who may also have an interest in gaining a better understanding of the ways in which repository resources are found by users of popular search engines, such as Google.  There will also be a need to have a better understanding of the tools used to carry out such analyses. How, for example, will SEO analysis tools address link farms and other ‘black hat’ SEO techniques which may provide significant volumes of links to resources which may, in reality, be ignored by Google?

William Nixon’s post concluded by pointing out the need for:

further analysis as we explore how Majestic SEO could be part of the repository altmetrics toolbox and how we can leverage its data to enhance access our research.

I suspect the University of Glasgow will not be alone in wishing to explore the potential of SEO analysis tools which can help in understanding current patterns of traffic to repositories and in shaping practices to enhance such traffic. I hope the work which has been described by Yvonne Budeden, Natalia Madjarevic and William Nixon has been useful to the repository community in summarising their initial experiences.

I should also add that Jenny Delasaale and I are giving a talk at the ILI 2012 conference which will ask “What Does The Evidence Tell Us About Institutional Repositories?” We are currently finalising the slides for the talk, which are available on Slideshare and embedded below. There is still an opportunity for us to update the slides, which might include a summary of plans for future work in this area. So we would very much welcome your feedback and suggestions. Perhaps you might be willing to publish a guest post on this blog which builds on last week’s posts?

Posted in openness, Repositories | Tagged: , | 5 Comments »

SEO Analysis of Enlighten, the University of Glasgow Institutional Repository

Posted by Brian Kelly on 25 October 2012

Background

In the third and final guest post published during Open Access Week William Nixon, Head of Digital Library Team at the University of Glasgow Library and the Service Development Manager of Enlighten, the University of Glasgow’s institutional repository service, gives his findings on use of  the MajesticSEO tool to analyse the Enlighten repository.


SEO Analysis of Enlighten, University of Glasgow

This post takes an in-depth look at a search engine optimisation (SEO) analysis of Enlighten, the institutional repository of the University of Glasgow. This builds on an initial pilot survey of institutional repositories provided by Russell Group universities described in the post on MajesticSEO Analysis of Russell Group University Repositories.

Background

University of Glasgow

Founded in 1451, the University of Glasgow is the fourth oldest university in the English-speaking world. Today we are a broad-based, research intensive institution with a global reach. It’s ranked in the top 1% of the world’s universities. The University is a member of the Russell Group of leading UK research universities. Our annual research grants and contracts income totals more than £128m, which puts us in the UK’s top 10 earners for research. Glasgow has more than 23,000 undergraduate and postgraduate students and 6000 staff.

Enlighten

We have been working with repositories since 2001 (our first work was part of the JISC funded FAIR Programme) and we now have two main repositories, Enlighten for research papers (and the focus of this post) and a second for our Glasgow Theses.

Today we consider Enlighten to be an “embedded repository”, that is, one which has “been integrated with other institutional services and processes such as research management, library and learning services” [JISC Call, 10/2010]. We have done this in various ways including:

  • Enabling sign-on with institutional ID (GUID)
  • Managing author identities
  • Linking publications to funder data from Research System
  • Feeding institutional research profile pages

As an embedded repository Enlighten supports a range of activities including our original Open Access aims to provide as any of our research outputs freely available as possible but also to act as a publications database and to support the university’s submission to REF2014.

University Publications Policy

The University has a Publications Policy, introduced to Senate in June 2008, has two key objectives:

  • to raise the profile of the university’s research
  • to help us to manage research publications.

The policy (it is a mandate but we tend not to use that term) asks that staff:

  • deposit a copy of their paper (where copyright permits)
  • provide details of the publication
  • ensure the University is in the address for correspondence (important for citation counts and database searches)

Enlighten: Size and Usage

Size and coverage

In mid-October 2012 Enlighten had 4,700 full text items covering a range of item types including journal articles, conference proceedings, book, reports and compositions. Enlighten has over 53,000 records and the Enlighten Team work with staff across all four Colleges to ensure our publications coverage is as comprehensive as possible.

Usage

We monitor Enlighten’s primarily via Google Analytics for overall access (including number of visitors, page views referrals and keywords) and EPrints IRStats package for downloads. Daily and monthly download statistics are provided in records for items with full text and we provide an overall listing of download stats for the last one and 12 month periods.

Looking at Google Analytics for the 1 Jan 12 – 30 Sep 12 (to tie in with this October snapshot) and the previous period we had 201,839 Unique Visitors up to 30 Sept 12 compared to 196,988 in 2011.

In the last year we have seen an increase in the number of referrals and our search traffic is now around 62%. In 2012 – 250,733 people visited this site, 62.82% was Search Traffic (94% of that is Google) with 157,503 Visits and 28.07% Referral Traffic with 70,392 visits.

In 2011 232,480 people visited this site, 69.97% of that was Search Traffic with 162,665 Visits and 18.98% came from referrals with 44,128 Visits.

Expectations

Our experience with Google Analytics has shown that much of our traffic still comes from search engines, predominantly Google but it has been interesting to note the increase in referral traffic, in particular from our local *.gla.ac.uk domain, this rise has coincided with the rollout of staff publication pages which are populated from Enlighten and provides links to the record held in Enlighten.

After *.gla.ac.uk domain referrals our most popular external referrals come from:

  • Mendeley
  • Wikipedia
  • Google Scholar

We expected that these would feature most predominantly in the Majestic results, with Google itself.

Majestic SEO Survey Results

The data for this survey was generated on the 22nd October 2012 using the ‘fresh index’, current data can be found from the Majestic SEO site with a free account. We do own the eprints.gla.ac.uk domain but haven’t added the code to create a free report. The summary for the site is given below showing 632 referring domains and 5,099 external backlinks. Interestingly it seems our repository is sufficiently mature for Majestic to all provide details for the last five years too.

Since we were looking at eprints.gla.ac.uk rather than *.gla.ac.uk we anticipated that our local referrals wouldn’t feature in this report. As a sidebar a focus just on gla.ac.uk showed nearly 411,000 backlinks and over 42,000 referring domains.



Figure 1.  Majestic SEO Summary for eprints.gla.ac.uk

This includes 619 educational backlinks and 54 educational referring domains. This shows a drop in the number of referring domains since Brian’s original post in August which showed 680 and a breakdown of the Top Five Domains (and number of links) as:

  • blogspot.com: 5,880
  • wordpress.com: 5,087
  • wikipedia.org: 322
  • bbc.co.uk: 178
  • cnn.com: 135

These demonstrate a very strong showing for blog sites, news and Wikipedia.


Figure 2. Top 5 Backlinks

Referring domains was a challenge! We couldn’t replicate the same Matched Links data which Warwick and the LSE have used. Our default Referring Domains report is ordered by Backlinks (other options including matches are available but none of our Site Explorer – Ref Domains options seemed to be able to replicate this. We didn’t use Create Report.

These Referring Domains ordered by Backlinks point us to full text content held in Enlighten from sites it’s unlikely we would have readily identified.

Figure 3a: Referring Domains by Backlinks


Figure 3b: Referring Domains by Matches (albeit by 1)

This report shows wikipedia.org at number one with the blog sites holding spots 2 and 3 and then Bibsonomy (social bookmark and publication sharing system) and Mendeley at 4 and 5.

An alternative view of the Referring Domains report by Referring Domain shows the major blog services and Wikipedia in the top 3, with two UK universities Southampton and Aberdeen (featuring again) in positions 4 and 5.

The final report is a ranked list of Pages, downloaded as CSV file and then re-ordered by ReferringExtBacklinks.

URL ReferringExtBackLinks CitationFlow TrustFlow
http://eprints.gla.ac.uk 584 36 28
http://eprints.gla.ac.uk/58987/1/58987.pdf 198 18 15
http://eprints.gla.ac.uk/2081/1/languagepictland.pdf 77 10 9
http://eprints.gla.ac.uk/562 70 24 2
http://eprints.gla.ac.uk/431 69 23 2
http://eprints.gla.ac.uk/225/01/Thomas[1].pdf 61 0 0

Table 1: Top 5 pages, sorted by Backlinks

These pages are:

  • Enlighten home page
  • PDF for “Arguments For Socialism”
  • PDF for “Language in Pictland”
  • A chronology of the Scythian antiquities of Eurasia based on new archaeological and C-14 data [Full text record]
  • Some problems in the study of the chronology of the ancient nomadic cultures in Eurasia (9th – 3rd centuries BC) [Full text record]
  • PDF for “87Sr/86Sr chemostratigraphy of Neoproterozoic Dalradian limestones of Scotland and Ireland: constraints on depositional ages and time scales” [Full text record]

Summary

Focusing in more detail on the results, in Figure 2, the top 5 backlinks, 4 out of the 5 are from Wikipedia, the first two are to the same paper but from different Wikipedia entries. It’s interesting to see that our third ranked backlink is the ROARmap registry.

Looking at the top 5 pages ranked by backlinks, none of the PDFs or the records which have PDFs currently appear in our IRStats generated list of most downloaded papers in the last 12 months. It is clear however, in this pilot sampling to draw a correlation between ranking and the availability of  full text and not merely a metadata record.

Discussion

While this initial work has focused on the Top 5, extending this to at least the Top 10 would be useful for further comparison, it was interesting to see that sites such as Mendeley appeared in variations of our Referring Domains which correlated with our Google Analytics reports which indicate that they are a growing source of referrals.

Looking at Figure 3a, a Google search, on the first referring domain (by backlinks) reveals that the number Ref Domain scientificcommons.org has 136,000 results on Google for “eprints.gla.ac.uk”, salero.info didn’t match at all and abdn.ac.uk had 5 results.

Social media sites such as Facebook and Twitter don’t appear in these initial results, it may be because the volume is insufficient to be ranked here or there may be breach of service issues. Google Analytics now provides some social media tools and we have been identifying our most popular papers from Facebook and Twitter.

This has been an interesting, challenging and thought-provoking exercise with the opportunity to look at the results and experiences of Warwick and the LSE who, like us reflect the use of Google Analytics to provide measures of traffic and usage.

The overall results from this work provide some interesting counterpoints and data to the results which we get from both Google Analytics and IRStats. These will need further analysis as we explore how Majestic SEO could be part of the repository altmetrics toolbox and how we can leverage its data to enhance access our research.


About the Author

William Nixon is the Head of Digital Library Team at the University of Glasgow Library. He is also the Service Development Manager of Enlighten, the University of Glasgow’s institutional repository service (http://eprints.gla.ac.uk). He been working with repositories over the last decade and was the Project Manager (Service Development) for the JISC funded DAEDALUS Project that set up repositories at Glasgow using both EPrints and DSpace. William is now involved with the ongoing development of services for Enlighten and support for Open Access at Glasgow. Through JISC funded projects including Enrich and Enquire he has worked to embed the repository into University systems. This work includes links to the research system for funder data and the re-use of publications data in the University’s web pages. He was part of the University’s team which provided publications data for the UK’s Research Excellence Framework (REF) Bibliometrics Pilot. William is now involved in supporting the University of Glasgow’s submission to the REF2014 national research assessment exercise. Enlighten is a key component of this exercise, enabling staff to select and provide further details on their research outputs.

Posted in Evidence, Guest-post, openness | 2 Comments »

SEO Analysis of LSE Research Online

Posted by ukwebfocusguest on 24 October 2012

Background

The second in the series of guest blog posts which gives a summary of an SEO analysis of a repository hosted at a Russell Group university is provided by Natalia Madjarevic, the LSE Research Online Manager. As described in the initial post, the aim of this work is to enable repository managers to openly share their experiences in use of MajesticSEO, a freely-available SEO analysis tool to analyse their institutional repositories.


SEO Analysis of LSE Research Online

This post takes an in-depth look at a search engine optimisation (SEO) analysis of LSE Research Online, the institutional repository of LSE research outputs. This builds on Brian Kelly’s post published on this blog in August 2012 on MajesticSEO Analysis of Russell Group University Repositories.

The London School of Economics and Political Science

Background

LSE is a specialist university with an international intake and a global reach. Its research and teaching span the full breadth of the social sciences, from economics, politics and law to sociology, anthropology, accounting and finance. Founded in 1895 by Beatrice and Sidney Webb, the School has a reputation for academic excellence. The School has around 9,300 full time students from 145 countries and a staff of just under 3,000, with about 45 per cent drawn from countries outside the UK. In 2008, the RAE found that LSE has the highest percentage of world-leading research of any university in the country, topping or coming close to the top of a number of rankings of research excellence. LSE came top nationally by grade point average in Economics, Law, Social Policy and European Studies and 68% of the submitted research outputs were ranked 3* or 4*.

LSE Research Online – a short history

LSE Research Online (LSERO) was set up in 2005 as part of the SHERPA-LEAP project. The aim of the project was to create EPrints repositories for each of the seven partner institutions, of which LSE was one, and to populate those repositories with full-text research papers. In June 2008 the LSE Academic Board agreed that records for all LSE research outputs would be entered into LSE Research Online. We have no full-text mandate but authors are encouraged to provide full-text deposits of journal articles in pre-publication form, clearly labelled as such, alongside references to publications. Research outputs included in LSE Research Online appear in LSE Experts profiles automatically, thereby reusing data collected by LSE Research Online.

LSE Research Online is to be the main source of bibliographic information for the Research Excellence Framework (REF) in 2014. This has served to further increase the impetus for deposit and visibility of the repository in the School and we have various repository champions throughout the School across departments.

LSE Research Online size and a brief look at usage statistics

As of September 2012, LSE Research Online contains around 33,696 records, with 7,050 full-text items. We include a variety of item types such as articles, book chapters, working papers, data sets, blogs and conference proceedings. We most recently began collecting LSE blogs to create a permanent home for this important content. We began tracking LSERO site usage with Google Analytics in 2007 and the site has received 2,268,135 visits since this date. According to Google Analytics, 76.55% (1,748,725 total visits) of traffic to LSE Research Online comes from searches. Only 16.13% of traffic is from referrals and 7.14% from direct traffic. We also use analog server statistics to monitor downloads and total downloads May 2007-Sept 2012 was 5,266,871.

Expectations of the survey

Before running the Majestic SEO report, I expected we would see plenty of traffic from Google and backlinks (i.e. incoming links) from lse.ac.uk as, understandably, these are key sources of traffic to LSERO and are indicated as such on Google Analytics. Google Analytics also points to referrals from Wikipedia and Google Scholar, and most recently, our Summon implementation which includes LSERO content. However, I was intrigued as to how LSERO would fare in an SEO analysis.

Majestic SEO survey results

The data was generated from Majestic SEO using a free account on 24th September 2012 using the ‘fresh’ index option. A summary of the results is shown below: there are 1,285 referring domains and 8,856 external backlinks. Note that the current findings can be viewed if you have a MajesticSEO account (which is free to obtain).

Figure 1: Majestic SEO analysis summary for eprints.lse.ac.uk

This includes 408 educational referring backlinks. If we look at backlinks in more detail, patterns begin to unravel:


Figure 2: Top 5 Backlinks

This illustrates a distinct majority of Wikipedia pages linking to LSERO content and yet this is only ranked as the sixth most popular source of traffic in Google Analytics.

Top referring domains, sorted by matched links, can be found in the table shown below:

Referring domains Matched links Alexa rank Flow Metrics
Citation flow Trust flow
wordpress.com 14502 21 95 93
blogspot.com 11239 5 97 94
wikipedia.org 349 8 97 98
flickr.com 272 33 98 96
google.com 225 1 99 99

Table 1: Top 5 Referring Domains

Flickr makes a surprise appearance, with WordPress and Blogger dominating the top of the table.

Top 5 items sorted by Majestic’s flow metrics can be found here:


Figure 3: Top 5 Resources in Repository (sorted by flow metrics)

Perhaps more indicative, the Top 5 linked resources sorted by number of backlinks can be found in the table shown below:

Ref no. URL Ext. BackLinks Ref. Domains CitationFlow TrustFlow
1 http://eprints.lse.ac.uk 501 83 45 41
2 http://eprints.lse.ac.uk/27939/1/HartwellPaper_English_version.pdf 417 69 28 19
3 http://eprints.lse.ac.uk/27072 225 4 27 32
4 http://eprints.lse.ac.uk/27939 130 46 30 25
5 http://eprints.lse.ac.uk/39826 112 54 22 23

Table 2: Top 5 Linked Resources in Repository (sorted by no. of links)

These pages are:

  1. The LSE Research Online homepage.
  2. A PDF of a research paper on climate policy.
  3. The record for a paper on teenager’s use of social networking sites.
  4. The record for a paper on climate policy.
  5. The record for a paper on open source software.

Summary

Looking in more detail at the top backlinks to the repository, as listed in Figure 2, we can see that Wikipedia represents four out of five top pages. This includes the Wikipedia page on Free Software, which links back to a Government report on the cost of ownership of open source software. The Wikipedia pages on the European Commission and Proportional Representation are ranked second and third respectively. The Proportional Representation page links back to the full-text of a 2010 workshop paper: Review of paradoxes afflicting various voting procedures where one out of m candidates (m ≥ 2) must be elected. The fifth and only backlink not be Wikipedia is avert.org, an AIDS Education site which links back to the record of an early LSERO paper: Peer education, gender and the development of critical consciousness : participatory HIV prevention by South African youth.

In Table 1, the Top 5 Referring Domains to LSE Research Online are WordPress, Blogspot, Wikipedia, Flickr and Google. We can see the dominance of international social platforms here with WordPress (14,502 links) and Blogspot (11239 links), followed by Wikipedia (349 links), Flickr (272 links) and, finally a search engine, google.com (225).

In Figure 3, Top 5 Resources in Repository (sorted by flow metrics), we can see several links to LSERO information pages including the home page and the feed of latest additions. There are, however, several direct links to full-text papers including an Economic History Working Paper on A dreadful heritage: interpreting epidemic disease at Eyam, 1666-2000Sorting this data by number of backlinks, as shown in Table 2, the top item is the LSERO homepage with 501 backlinks. The second item is the PDF of one of our most downloaded papers of all time: The Hartwell Paper.

Discussion

So what can I draw from the results of the Majestic SEO report of LSE Research Online? Analysing the top referring domains according to the Majestic report, it seems reasonable to suggest that adding links to repository content on blogging platforms such as WordPress and Blogspot may result in an increased SEO ranking. We often link to LSERO content in various LSE Library blogs hosted on Blogspot, including New Research Selected by LSE Library. Flickr is also listed as a top referring domain according to the Majestic SEO but running a Google search for site:flickr.com “eprints.lse.ac.uk” retrieves zero results. It’s difficult to ascertain how MajesticSEO gets this result when Google does not confirm the findings – perhaps it uses very different algorithms to Google? The MajesticSEO top referring domains indicate that blogging platforms are the main referring domains to LSERO content. However, according to our Google Analytics stats, 76.55% of traffic to LSERO is from searches. Furthermore, the Majestic report indicates that there are 349 matched links to LSERO content on Wikipedia. “Running the search site:wikipedia.org “eprints.lse.ac.uk” in http://www.google.co.uk/ you get (on 11 October 2012) “About 92 results”. From the last page of the results, by repeating the search to include omitted results, Google ends up with 80 hits.” Searching for eprints.lse.ac.uk in http://en.wikipedia.org/wiki/Main_Page retrieves 83 hits. How does MajesticSEO retrieve such varying results?

Looking at backlinks, it’s important to note that the majority of top backlinks refer to papers that have the full-text attached and often link directly to the full-text PDF, of course resulting in a direct download. In addition, the Top 5 Resources in Repository (sorted by external backlinks) as seen in Table 2 tallies with our consistently popular papers according to Google Analytics and our analog statistics.

It is apparent that the inclusion of repository links on domains such as Wikipedia and blogging platforms appears to have a positive impact in helping the relevancy ranking weighting for LSERO content in web pages. This is not to mention direct hits on the links themselves, adding directly to the site’s visitors, and thus the dissemination of LSE research outputs. However, whether we can draw firm conclusions from the Majestic report remains to be seen, particularly with such differing results to those found on Google.

Thanks to my colleague Peter Spring for his advice when writing this post.


About the Author

Natalia Madjarevic is the manager of LSE Research Online, LSE Theses Online and LSE Learning Resources Online, the repositories of The London School of Economics and Political Science.

Natalia is also the Academic Support Librarian for the Department of Economics and LSE Research Lab. Joining LSE in 2011, prior to that Natalia worked at libraries including UCL, The Guardian and Queen Mary, University of London. Her professional interests include Open Access, research support, REF, bibliometrics and digital developments in libraries.

Posted in Evidence, Guest-post, Repositories | 4 Comments »

SEO Analysis of WRAP, the Warwick University Repository

Posted by ukwebfocusguest on 23 October 2012

SEO Analysis of a Selection of Russell Group University Repositories

A post published in August 2012 on an MajesticSEO Analysis of Russell Group University Repositories highlighted the importance of search engine optimisation (SEO) for enhancing access to research papers and is part of a series of articles on different repositories and provided summary statistics of the SEO rankings for 24 Russell Group University repositories.

This work adopted an open practice approach in which the initial findings were published at an early stage in order to solicit feedback on the value of such work and the methodology used. There was much interest in this initial work, especially on Twitter. Subsequent email discussions led to a number of repository managers at Russell group universities agreeing to publish more detailed findings for their repository, together with contextual information about the institutional and the repository which I, as a remote observer, would not be privy too.

We agreed to publish these findings on this blog during Open Access Week. I am very grateful to the contributors for finding time to carry out the analysis and publish the findings during the start of the academic year – a very busy period for those working in higher education.

The initial post was written by Yvonne Budden, the repository manager for WRAP, the Warwick Research Archives Project. It is appropriate that this selection of guest blog post begins with a contribution about the Warwick repository as Jenny Delasalle, a colleague of Yvonne’s at the University of Warwick and myself will be giving a talk on “What Does The Evidence Tell Us About Institutional Repositories?” at the ILI 2012 conference to be held in London next week.


SEO Analysis of the University of Warwick’s Research Repositories

The following summary of a MajesticSEO survey of the University of Warwick’s research repositories, together with background information about the university and the repository environment has been provided by Yvonne Budden.

A Little Background on Warwick

The University of Warwick is one of the UK’s leading universities with an acknowledged reputation for excellence in research and teaching, for innovation and for links with business and industry. Founded in 1965 with an initial intake of 450 undergraduates, Warwick now has in excess of 22,000 students and employs close to 5,000 staff. Of those staff just fewer than 1,400 are academic or research staff. Warwick is a research intensive institution and our departments cover a wide range of disciplines, including medicine and WMG, a specialist centre dedicated to innovation and business engagement. In the 2008 RAE nineteen of our departments were ranked in the top ten for their unit of assessment and 65% of the submitted research outputs were ranked 3* or 4*.

University of Warwick’s Research Repositories

Warwick’s research repositories began in the summer of 2008 with the Warwick Research Archives Project (WRAP), a JISC funded project that created a full text, open access archive for the University. WRAP funding was taken by the Library and in April 2011 we launched the University of Warwick Publications service, which was designed to ‘fill the gaps’ around the WRAP content with a comprehensive collection of work produced by Warwick researchers. The services work on the same technical infrastructure but WRAP remains distinct and exposes only the full text open access material held. The system runs on the most recent version of the EPrints repository software, using a number of plugins for export, statistics monitoring and most recently to assist in the management of the REF2014 submission. To date we do not have a full text mandate for WRAP and engagement with both WRAP and the Publications service varies across the departments. Deposit to the services is highly mediated through the repository team and so engagement is not necessarily reflected in the number of papers available per department, especially as some departments benefit more from the service’s policy of pro-active acquisition of new material where licenses allow. I would judge that our best engagement in terms of full text deposit comes from Social Science researchers but we also have some strong champions in the Medical School, History, Life Sciences and Psychology.

Size and Usage Statistics

At the end of August 2012 WRAP contained 6,554 full text items covering a range of item types, journal articles, theses, conference papers, working papers and more. The Publications service contained a further 40,753 records. In terms of usage since its launch the system has seen 900,997 visits according to Google Analytics, an average of just over 18,000 a month in the 50 months active. To track downloads we use the EPrints plugin, IR Stats, this counts file downloads either directly or through the repository interface. IR Stats will only count one download per twenty-four hours from each source, but will count multiple downloads if an item has multiple files attached. Over the life of WRAP the files held have been downloaded a grand total of 730,304 times with 49.08% of downloads coming from Google or Google Scholar.

Expectations of the Survey

Going into the survey using the MajesticSEO system wasn’t sure what to expect from the results, the majority of the work we’ve done so far with the statistics is with the Google Analytics and the IR Stats package. Looking at the referral sources in the our Google output I can indicate a number of sources I might expect to see back links into the system, including our Business School (wbs.ac.uk) and the Bielefeld Academic Search Engine(BASE) as well as a number of smaller sources. The Warwick Blogs service seems to have fallen out of favour over the past few years with the number of hits from there dropping as people move to other platforms. Above all I’m most curious to see if the SEO analysis can help with the work I am doing in promoting the use of WRAP and the material within it. If this work can assist me in creating the kinds of ‘interest stories’ that help to persuade researchers to deposit it could become another valuable source of information. We are also looking at expanding the range of metrics we have access to, looking at the IRUS project as well as the forthcoming updated version of IR Stats, recently demonstrated at Open Repositories 2012.

Our Survey Results

The data for this survey was generated on the 10th September 2012 using the ‘fresh index’ option, although the images were captured on 19 October. The current results can be found if you have a MajesticSEO account (which is free to obtain). The summary for the site is given below showing 413 referring domains and 2,523 backlinks.


Figure 1: MajesticSEO analysis summary for wrap.warwick.ac.uk

On first glance this seems to be rather low in terms of backlinks, it also shows a fairly low number of educational domains linking to us. The top five backlinks in to the system can be seen below, ranked as standard by the system by a combination of citation and trust flow:


Figure 2: Top 5 Backlinks

Interestingly this lists some of the popular referrers we see in Google Analytics driving traffic to us, but not some others I might have expected to see. The top referring domains are shown below:

Figure 3: Top Referring Domains

This is the only place in the results where Google features at all. The top five pages, as ranked by the flow metrics show a fairly distinct anomaly, as two of the pages are not listing any flow metric information despite this supposedly being the method by which they are ranked:

Figure 4: Findings Ranked by Flow Metrics

The top five pages as sorted by number of backlinks can be seen in the table below:

Ref No. URL Ext. Backlinks Ref. Domains Citation Flow Trust Flow
1 http://wrap.warwick.ac.uk/2489 228 1 14 0
2 http://wrap.warwick.ac.uk 177 23 37 37
3 http://wrap.warwick.ac.uk/1539/1/WRAP_Horvath_twerp647.pdf 91 31 15 13
4 http://wrap.warwick.ac.uk/1335/1/WRAP_Oswald_twerp_882.pdf 82 4 11 9
5 http://wrap.warwick.ac.uk/1118 46 4 17 2

Table 1: Top 5 Pages, Sorted By Number of Links

These five items are as follows:

  1. A research paper on the impact of cotton in poor rural households in India.
  2. The WRAP homepage.
  3. A PDF of an economics working paper on currency area theory.
  4. A PDF of an economics working paper on happiness and productivity.
  5. The record for a PhD thesis on Women poets.

Summary

The top ten backlinks into the WRAP system include a range of sources, from this blog, two Wikipedia pages and two referrals from the PhilPapersrepository, which monitors journals, personal pages and repositories for Philosophy content. We also see a two of pages that collect literature on health topics who are linking back to us, a Maths blog and the newsletter of the British Centre of Science Education.

Interestingly in Figure 3 there is no mention of the University of Warwick or any of its related domains (wbs.ac.uk for the Business School, for instance). I assume this is because MajesticSEO are excluding ‘self’ links, so as WRAP is a Warwick subdomain they are excluding a lot of the links I am aware of. This may also take into account the lack of any backlinks from the Warwick Blogs service. Many of the domains listed here are blog platforms of one form or another, which may be because of the database driven architecture of these platforms and the way the MajesticSEO system are reading those links. For example, if a researcher puts a link to his most recent paper in WRAP on the frame of the blog and this propagates onto every post in the blog, does this count as a single link or as many? We are also seeing links from sources such as the BBC and Microsoft, where, again, it would be nice to be able to see who was linking to what and from where in these domains.

The top pages, as listed by number of backlinks in Table 1, show a trend for linking directly to the file of the full text material we hold in WRAP. This information would tie in nicely with the fact that item three is the most downloaded paper in WRAP over the lifetime of the repository, with 9,162 downloads to the end of August 2012. So in this case we can draw a tentative line between the number of downloads and the number of backlinks. However we can’t follow this theory through, especially as the top paper linked to externally, Paper 1 as listed in Table 1, has been downloaded only a fraction of the number of times compared to the currency working paper. When listed by the flow metrics, as in Figure 4 the pages largely follow the results as seen for the Opus repository at Bath and link to pages about the repository. This is apart from the two anomalous results where despite having no citation or trust flow scores they are ranked second and third, when ranked on flow metrics.

Discussion

I think when looking at metrics the most important thing for a repository manager to do is to be able to build stories around the metrics, as these help the researchers to engage with the figures. Was this spike in downloads because of featuring in a conference, or an author moving to a new institution, or for some other reason? What can I show my users that are going to help them to make the decision to use us over other options and to expend scare time resources maintain a blog or Twitter account? Here the issue, I have with the data we have discovered is that the number of backlinks into a repository will never conclusively prove that a paper will get more downloads, as ably illustrated by the example above. Many researchers are not interested in the fuzzy conclusions we can draw at this point; they want to see clear, conclusive proof that links = downloads = citations.

I also think that search engine performance is an increasingly difficult area to be really conclusive about, especially now users can ‘train’ their Google results to prefer the links they click on most often. This was recently a cause of concern for us as it was reported that our Department of Computer Science (DCS)’s EPrints repository was overtaking our Google ranking and that WRAP didn’t feature until page two of the results now. This wasn’t the case, but because the user reporting this to us was heavily involved in the area of computer science his Google rankings had preferred the DCS repository to the WRAP one as the results were more relevant to his interests. In the same was as when I search for ‘RSP’ my top result is now the Repositories Support Project and not, RSP the Engineering Company or the Peterborough Health and Safety firm as it was initially

We need to always be conscious of what the researcher want from metrics and whether it is possible for us to give it to them. As with any metrics we need to be aware that we have to be explicit in what it is that we are saying and what can be inferred by it. If we are users of metrics don’t understand how the metrics are being developed or how the search engines ranking algorithms work, we won’t be able to confidently predict what we can do to improve them. It may also come down to the way researchers are using these services and for what purpose, which may be why we are not seeing any evidence of the use of services like Academia.edu and LinkedIn. I would imagine if researchers are using services to showcase their work to prospective employers and other researchers they may prefer to link to the publisher’s version of their work rather than the repository versions. I suspect the interest story from the SEO data may be more about ‘who’ is linking to their work rather than where they are linking from, which is detail we cannot and possibly should not be able to provide.


About the Author

Yvonne Budden (@wrap_ed), the University of Warwick’s E-Repositories Manager is responsible for WRAP, the Warwick Research Archive Portal and is the current Chair of the UK Council for Research Repositories (UKCoRR).

Email: Y.C.Budden@warwick.ac.uk

Posted in Evidence, Guest-post, Repositories | 3 Comments »

Open Practices for the Connected Researcher

Posted by Brian Kelly on 22 October 2012

Today sees the start of Open Access Week, #OAWeek. As described on the Open Access Week Web site:

Open Access Week, a global event now entering its sixth year, is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned with colleagues, and to help inspire wider participation in helping to make Open Access a new norm in scholarship and research.

I am participating in Open Access Week by sharing my experiences of making use of the Social Web to maximise access to papers hosted in institutional repositories. Tomorrow (Tuesday 23 October 2012) I am giving a talk on “Open Practices for the Connected Researcher” in a seminar which is part of a series of Open Access Week events which are taking place at the University of Exeter.

On Thursday, as described in a news item published by the University of Salford, I am the invited guest speaker for an Open Access event which will take place at the  Old Fire Station at the University of Salford where I will give a talk on “Open Practices and Social Media for the Connected Researcher“.

The following day I will be giving a talk on “Open Access and Open Practices For Researchers” at the University of Bath. This event, which marks the launch of a Social Media programme for Researchers, will include a presentation from Ross Mounce, a PhD student and Open Knowledge Foundation Panton Fellow at the University of Bath, who will talk about the need for true Open Access (as originally defined), why it matters and the plethora of options we have for OA publishing in addition to my talk.

In addition to such ‘real-world’ activities in support of Open Access Week I am also taking part in the Networked Researcher Blogging Unconference and earlier today published the launch post for the unconference.

My slides for tomorrow’s talk are available on Slideshare and are embedded below.

Posted in openness, Repositories | Tagged: | 3 Comments »

My Response to WAI’s Website Accessibility Conformance Evaluation Methodology 1.0 Working Draft

Posted by Brian Kelly on 19 October 2012

Last week in a post entitled W3C WAI Invite Feedback on Website Accessibility Conformance Evaluation Methodology 1.0 Working Draft I highlighted the publication of WAI’s  Website Accessibility Conformance Evaluation Methodology 1.0 working draft and encouraged readers to respond to the call for feedback.

The closing date for comments is tomorrow, 20 October 2012. I have submitted my comments which are given below.


Response to the WCAG-EM 1.0 Working Draft

The Web Accessibility Initiative’s work in providing guidelines which can help enhance the accessibility of Web resources for people with disabilities since WAI’s launch in 1997 [1]  is to be valued.

However, as might be expected (and is the case with many of the standards which have been developed over the years by W3C), the various guidelines which have been produced by WAI have shown to have limitations or proven inappropriate for use in a real-world context. Accessibility researchers and practitioners based primarily in the UK have been pro-active in identifying limitations of  the WAI model and proposing ways in which the guidelines can be contexualised and used where appropriate. This work dates back to 2005 when a paper entitled “Forcing Standardization or Accommodating Diversity? A Framework for Applying the WCAG in the Real World” was presented at the W4A 2005 conference [2]. Further work included papers on  Contextual Web Accessibility – Maximizing the Benefit of Accessibility Guidelines [3],  Accessibility 2.0: People, Policies and Processes [4], One World, One Web … But Great Diversity [5], From Web Accessibility to Web Adaptability [6], Developing Countries; Developing Experiences: Approaches to Accessibility for the Real World [7] and A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First [8].

The abstract for our most recent paper [8] summarised the concerns we have regarding the WAI model (which is based on three sets of guidelines – WCAG, UAAG and ATAG:

This paper argues that web accessibility is not an intrinsic characteristic of a digital resource but is determined by complex political, social and other contextual factors, as well as technical aspects which are the focus of WAI standardisation activities. It can therefore be inappropriate to develop legislation or focus on metrics only associated with properties of the resource.

The authors describe the value of standards such as BS 8878 which focus on best practices for the process of developing web products and include a user focus.

I have concerns that the WAI’s Website Accessibility Conformance Evaluation Methodology 1.0 working draft [9] could be counter-productive if it is used by policy-makers to  mandate conformance with WCAG, rather than treating WCAG as a valuable set of guidelines whose use should be considered in context.

The WAI model itself provides one example of such contextual issues. WAI’s view of what it refers to as ‘universal accessibility‘ is that this requires conformance with WCAG, UAAG and ATAG guidelines. Since browsers which do not conform with ATAG are not ubiquitous it is clear that the values of WCAG conformance will be limited. In addition the ways in  Web content is created has changed drastically since WAI was launched and the WAI model developed.  Email messages sent to WAI mailing lists, for example, will be Web content hosted on the WAI’s mailing list archive on the W3C Web site. It is unlikely that such content will conform with WCAG guidelines.

A recent post entitled “John hit the ball”: Should Simple Language Be Mandatory for Web Accessibility? [10] highlighted that WAI have acknowledged that conformance with the current WCAG guidelines will n0t, as some people mistakenly think, address all disabilities. However, as described in the post,  providing additional guidelines for incorporation in a future version of WCAG would be inappropriate as guidelines which mandate use of simple language would not be welcomed by everybody, for reasons described in the post and a more in-depth post on The complexities of simple: What simple language proponents should know about linguistics [11] by Dominik Lukes.

Beyond the limitations of the WAI model there are the contextual factors regarding the purposes of Web resources (which the WAI document highlights). The WAI model was developed at a time when the Web was being used primarily as an informational resource, although we were also seeing examples of commercial transactions being developed. But beyond the provision of information and the purchasing of products which are mentioned in the WAI document, there are also more complex areas such as learning and cultural appreciation for which there is a need to develop a better understanding of what is meant by such areas in a Web context.

It should also be noted that clarity provided on the scope of Web resources provided in the WAI document may ironically lead to organisations failing to provide Web resources which may provide accessibility benefits to some if they fail to conform fully with WCAG guidelines. This is likely to be particularly the case in the public sector, who may be required to provide Web sites which conform fully to WCAG guidelines.

In addition to dangers that this may lead to online resources failing to be deployed, there is also a need to consider the costs of providing resources which conform fully with WCAG guidelines, particularly at a time of economic constraints. To give a particular example a paper entitled Supporting PDF accessibility evaluation: early results from the FixRep project [12] analysed the provision of metadata in PDFs of (typically) peer-reviewed papers hosted in a university’s institutional repository and concluded:

“This means that only 10% of all PDFs processed have any likelihood of conforming to accessibility guidelines, and even then we would require further content level analysis to evaluate the extent to which they do indeed conform.”

It is felt (although further research is needed) that these findings are likely to be the case across institutional repositories more widely. Should we require that peer-reviewed papers should not be hosted on institutional repositories unless they conform with WCAG guidelines? If such a decision is made, what will the financial implications be and will “just-in-case accessibility” be an appropriate investment of scarce financial resources?

In light of such issues (which are discussed in more detail in the peer-reviewed papers which have been mentioned) what actions are appropriate for the Website Accessibility Conformance Evaluation Methodology 1.0 working draft? I would suggest that the document should explicitly mention the limitations of the WAI model (i.e. its dependencies of ATAG and UAAG) ; the need to address contexual factors and the need to address accessibility issues in a broader context including the context of use and purpose of the Web resource and the financial implications of conforming with the guidelines.

Finally I would suggest that document makes it clear that it would be inappropriate for policy-makers and legislators to enact legislation based solely on WCAG conformance. I would hasten to add that this is not to suggest that no interventions need to be made. Rather I would propose that it would be more appropriate to develop policies and legislation based on the processes surrounding the development of Web products as suggested in  Accessibility 2.0: People, Policies and Processes [4]. In the UK, such approaches have been described in the British Standard Institute’s BS 8878 Web Accessibility Code of Practice which is described at [13].

References

1. WAI Launch Agenda, WAI,  http://www.w3.org/WAI/References/agenda

2. Forcing Standardization or Accommodating Diversity? A Framework for Applying the WCAG in the Real World, Kelly, B., Sloan, D., Phipps, L., Petrie, H. and Hamilton, F. Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A). ISBN: 1-59593-036-1.  http://opus.bath.ac.uk/438/

3.  Contextual Web Accessibility – Maximizing the Benefit of Accessibility Guidelines, Sloan, D., Kelly, B., Heath, A., Petrie, H. Fraser, H. and Phipps, L. WWW 2006 Edinburgh, Scotland 22-26 May 2006. Conference Proceedings, http://opus.bath.ac.uk/402/

4. Accessibility 2.0: People, Policies and Processes, Kelly, B., Sloan, D., Brown, S., Seale, J, Petrie, H., Lauke, P. and Ball, S. WWW 2007 Banff, Canada, 7-11 May 2007. http://opus.bath.ac.uk/398/

5. One World, One Web … But Great Diversity, Kelly, B., Nevile, L., Draffan, EA. and Fanou, S. WWW 2008 Beijing, China, 21-22 April 2008. Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A), Beijing, China. Pages 141-147, Year of Publication: 2008. ISBN:978-1-60558-153-8 DOI: http://doi.acm.org/10.1145/1368044.1368078

6. From Web Accessibility to Web Adaptability, Kelly, B., Nevile, L., Sloan, D., Fanou, S., Ellison, R. and Herrod, L.
Disability and Rehability: Assistive Technology, Volume 4, Issue 4, July 2009, pages 212 – 226. DOI: 10.1080/17483100902903408

7. Developing Countries; Developing Experiences: Approaches to Accessibility for the Real World, Kelly, B., Lewthwaite, S. and Sloan, D. W4A2010, April 26-27, 2010, Raleigh, USA. Co-Located with the 19th International World Wide Web Conference. Copyright 2010 ACM ISBN: 978-1-4503-0045-2
DOI: 10.1145/1805986.1805992

8. A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First,  Cooper, M., Sloan, D., Kelly, B. and Lewthwaite, S. W4A 2012, April 16-17, 2012, Lyon, France. Co-Located with the 21st International World Wide Web Conference. Copyright 2012 ACM ISBN 978-1-4503-1019-2

9. Website Accessibility Conformance Evaluation Methodology 1.0 working draft, WAI, 20 September 2012. http://www.w3.org/TR/2012/WD-WCAG-EM-20120920/

10. “John hit the ball”: Should Simple Language Be Mandatory for Web Accessibility?, Kelly, B., UK Web Focus blog, 19 Sept 2012, http://ukwebfocus.wordpress.com/2012/09/18/john-hit-the-ball-simple-language-mandatory-for-web-accessibility/

11. The complexities of simple: What simple language proponents should know about linguistics, Lukes, D. Metaphor Hacker blog,  28 Septemeber 2012, http://metaphorhacker.net/2012/09/the-complexities-of-simple-what-simple-language-proponents-should-know-about-linguistics/

12.  Supporting PDF accessibility evaluation: early results from the FixRep project. In: 2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010), 2010-05-25 – 2010-05-28, Chania.  http://opus.bath.ac.uk/24958/

13. BS 8878 web accessibility standards (supersedes PAS 78) – all you need to know, Jonathan Hassell, http://www.hassellinclusion.com/bs8878/


Twitter conversation from: [Topsy]

Posted in Accessibility | 1 Comment »

Librarians, Change or be Irrelevant!

Posted by Brian Kelly on 16 October 2012

“Change or be Irrelevant!”

Change or be Irrelevant” was the title of Lukas Koster’s blog post in which he gave his reflections on the EMTACL12 (Emerging Technologies in Academic Libraries) conference which was held recently in Trondheim.

The need to be able to adapt to the requirements of the rapidly changing technical and economic contexts faced by those working in higher education was highlighted by Karen Coyle in her invited plenary talk entitled “Think ‘Different’“. Lukas provided a useful summary of the talk:

Think “different”’ is what Karen Coyle told us, using the famous Steve Jobs quote. And yes, the quotes around “different” are there for a reason, it’s not the grammatically correct “think differently”, because that’s too easy.  What is meant here is: you have to have the term “different” in your mind all the time. Karen Coyle confronted us with a number of ingrained obsolete practices in libraries.

But what are the technological developments which may have an impact on the academic library sector? In the closing talk at the conference I presented a paper on “What Next for Libraries? Making Sense of the Future” (available in PDF and MS Word formats) in which I described the evidence-based methodology used by the JISC Observatory team which aims to help organisations wishing to identify signals of technological developments which may have a significant impact on working practices.

In my talk (which is available on Slideshare) I reminded the audience of the inventions from the days of our youth which failed to live up to our expectations, including the monorail (which we’d use to travel to work, lunar bases (where we’d go for our holidays) and the jetpack. Patrick Hochstenbach picked up on my suggestion of the relevance of jetpacks for librarians in a cartoon in which he depicted a “super shush librarian” who makes sure that patrons aren’t making unnecessary noises in a distributed library environment :-)

Although some may be critical of the stereotype, I felt this provided a useful depiction of the way in which we expect inventions to simple automate existing practices, rather than transform such practices. This, therefore, illustrated the point I made about space travel: we may have expected the lunar landing which took place in 1969 to lead to further space exploration, including bases on the moon and possible Mars. In reality, however, manned space exploration ceased with the last manned mission to the moon taking place as long ago as December 1972. Rather than the manned space exploration we may have expected, we sent unmanned rockets to Mars, the moon and around the solar system (indeed last week we heard the news that the deep space probe Voyager 1 had left the solar system).

Preparing For Change; Preparing to be Relevant

Recently the JISC Observatory has published a report on Preparing for Data-driven Infrastructure and the final version of a report on Preparing for Effective Adoption and Use of eBooks in Education, is due to be published in a few weeks time. These are two of the areas which JISC Observatory team members have identified as likely to be significant for the higher education sector. I would also add that these are areas which will be relevant for those working in academic libraries. There should be no need to mention the importance of the Mobile Web which was another area addressed in a JISC Observatory report on Delivering Web to Mobile.

The theme of preparing for change and preparing to be relevant is also being addressed at ILI 2012, the Internet Librarian International conference which takes place in London on 30-31 October. This year the event has the byline “Re-imagine, Renew, Reboot: Innovating for Success“. I’ll be giving a talk on “Making Sense of the Future” which will explore the ideas described in this post and the paper presented at the EMTACL12 conference. For those who can’t attend, I’ve summarised the presentation in the following cartoon :-)

Twitter conversation from: [Topsy]

Posted in Events | Tagged: , , | Leave a Comment »

“Standards are voluntarily adopted and success is determined by the market”

Posted by Brian Kelly on 15 October 2012

Yesterday (Sunday 14 October) was World Standards Day. As described on Wikipedia “The aim of World Standards Day is to raise awareness among regulators, industry and consumers as to the importance of standardization to the global economy“. It is therefore timely to highlight Open Stand. As described on the Open Stand Web site:

On August 29th five leading global organizations jointly signed an agreement to affirm and adhere to a set of Principles in support of The Modern Paradigm for Standards; an open and collectively empowering model that will help radically improve the way people around the world develop new technologies and innovate for humanity.

The “Modern Paradigm for Standards”is shaped by adherence to five principles:

  1. Due process: Decisions are made with equity and fairness among participants. No one party dominates or guides standards development. Standards processes are transparent and opportunities exist to appeal decisions. Processes for periodic standards review and updating are well defined.
  2. Broad consensus: Processes allow for all views to be considered and addressed, such that agreement can be found across a range of interests.
  3. Transparency: Standards organizations provide advance public notice of proposed standards development activities, the scope of work to be undertaken, and conditions for participation. Easily accessible records of decisions and the materials used in reaching those decisions are provided. Public comment periods are provided before final standards approval and adoption.
  4. Balance: Standards activities are not exclusively dominated by any particular person, company or interest group.
  5. Openness: Standards processes are open to all interested and informed parties.

The “Modern Paradigm for Standards” itself is based on five key approaches:

  1. Cooperation
  2. Adherence to the principles listed above
  3. Collective empowerment
  4. Availability
  5. Voluntary Adoption

The Topsy tool provides a useful means of observing Twitter discussions about web resources. Looking at recent English-language tweets about the Web site we can see a useful summary:

5 organizations - #IETF#IEEE#W3C#IAB@Internet Society – issue joint statement on open Internet standards - http://t.co/cO2rQvGH

together with a summary of the aims of this initiative:

check out the uber standards org @openstand that will drive innovation globally through interoperability http://t.co/cKrkYnvr

and an acknowledgement that more work is needed if the goal of “driving innovation globally through interoperability” is to be realised:

OpenStand (http://t.co/2g50zvMc) is good politics; that it doesn’t go far enough just shows there’s still work to be done.

However it is the single sentence summary of what is meant by “Voluntary Adoption” which struck me as being of greatest interest:

Standards are voluntarily adopted and success is determined by the market.

In the past I think there has been a view that open standards exist independently of the market place with public sector organisations, in particular, being expected to distance themselves from the market economy in the development and procurement of IT systems. However this statement of a “modern paradigm for standards” makes it clear that standards bodies such as the W3C, IETF, IEEE, IAB and the Internet Society are explicit that the success of open standards is dependent of acceptance of the standards across the market place. Back in September 2008 I highlighted the importance of market place acceptance of open standards:

many W3C standards …  have clearly failed to have any significant impact in the market place – compare, for example, the success of Macromedia’s Flash (SWF) format with the niche role that W3C’s SMIL format has.

and 2 months later a post entitled Why Did SMIL and SVG Fail? generated a discussion about criteria for identifying failed standards. Perhaps, as was suggested in the comments on the post, SMIL and SVG merely have had a very slow growth to reach market acceptance. But I can’t help but feel that if SMIL and SVG are belatedly felt to be successful standards this will have been as a result of the decision by Apple not to support Flash on the iOS plaform for Apple’s mobile devices. This seems to provide a good example of the Open Stand’s principle that “Standards are voluntarily adopted and success is determined by the market“. We can now see parallels between the selection of third-party services to support institutional activities and the selection of open standards to support development activities. Interestingly such issues were discussed at the CETIS meeting on “Future of Interoperability Standards” held in Bolton in January 2010. I hope that the Opportunities and Risks Framework For Standards which I presented at the meeting can provide an approach for helping to identify the standards which can achieve success in the market place.

Twitter conversation from: [Topsy]

Posted in standards | Leave a Comment »

W3C WAI Invite Feedback on Website Accessibility Conformance Evaluation Methodology 1.0 Working Draft

Posted by Brian Kelly on 8 October 2012

On Monday 20 September 2012 the W3C WAI published the Website Accessibility Conformance Evaluation Methodology 1.0 working draft. The W3C invites comments on this working draft which should be sent by 20 October 2012 to public-wai-evaltf@w3.org (note that a publicly visible mailing list archive is available).

This is a large document (31 pages when printed) and so I am giving time for those with responsibilities for managing large-scale Web sites to read this document and provide feedback. It should be noted that since institutions may have accessibility policies which claim conformance with WAI guidelines, it will be important that the conformance criteria are realistic and achievable, and that conformance does not add other significant barriers to the provision of institutional Web sites.

It should be noted that the scale of university Web sites will provide particular challenges in achieving compliance. As described in the working draft:

A website may include areas with smaller collections of related web pages such as an online shop, an area for each department within the organization, a blog area, and other parts. In some situations such areas can be considered to be a full, self-enclosed website each. This methodology can be applied to such individual sub-sites (a website within another website) and to the main website in its entirety. However, this methodology may not be applied to a website excluding any of its parts. Excluding parts of the website from the scope of evaluation would likely conflict with the WCAG 2.0 conformance requirements full pages and complete processes, or significantly distort the evaluation results.

The document then goes on to depict a typical University Web site:

and explains how:

In the example above, none of the depicted parts may be excluded from the scope of evaluation in the context of this methodology, if it is to applied to the university website. This includes any aggregated and embedded content such as online maps for the university campus and forms for credit card transactions, including when such parts originate from third-party sources.

Note that the document defines a website asA coherent collection of one or more related web pages that together provide common use or functionality. It includes static web pages, dynamically generated web pages, and web applications“. The University of Bath Web site at http://www.bath.ac.uk/ is clearly one example of a coherent collection of related web pages. However it is less clear whether other Web services hosted on the same domain, such as the repository at http://opus.bath.ac.uk/, would also be regarded as part of the coherent set of related web pages. It might be safe to assume that this is the case; in which case accessibility conformance might need to apply to every page (including dynamic pages) hosted under *.bath.ac.uk. Therefore, content provided by third-party services, such as embedded YouTube videos, embedded RSS feeds and content included in Web pages using HTML iframe elements, JavaScript and other syndication technologies would also be included.

This represents quite a challenge in ensuring that the content will conform with WCAG 2.0 guidelines! Especially when one considers that the WCAG guidelines are independent of the particular file formats used to host the content; so PDF, MS Word, MS PowerPoint, etc. files which have an institutional URL

Revisiting WCAG 2.0 Guidelines

In order to illustrate the difficulties to be faced in conforming with WCAG 2.0 guidelines, consider the challenges in ensuring full conformance with Principle 3: Understandable – Information and the operation of user interface must be understandable.

On the surface this does not appear unreasonable. The WCAG 2.0 document then provides a more specific guideline: Guideline 3.1 Readable: Make text content readable and understandable. The difficulties start when you see the details:

3.1.4 Abbreviations: A mechanism for identifying the expanded form or meaning of abbreviations is available. (Level AAA)

3.1.6 Pronunciation: A mechanism is available for identifying specific pronunciation of words where meaning of the words, in context, is ambiguous without knowing the pronunciation. (Level AAA)

Yes, in order for an institutional Web site to be conformant with WCAG Level AAA every page, Web pages which contain an abbreviation must provide a mechanism for identifying the expanded form or meaning of abbreviations and for identifying specific pronunciation of words where meaning of the words, in context, is ambiguous without knowing the pronunciation! OK? Or perhaps I should have written “OK (orl korrect)?” as this is one of the possible origins of the abbreviation.

I think it is safe to say that no institution should consider stating that its Web site conforms with WCAG AAA guidelines. But will it be possible for any large-scale Web site to conform fully with all WCAG guidelines, including those which are relevant to WCAG A conformance? I would have thought that any Web site which embeds content from third-party services will not be able to guarantee that the embedded content will be conformant.

Perhaps it is time to move away from stating conformance with WCAG guidelines and, instead, making use of alternative approaches, with BS 8878 providing an approach to consider for those based in the UK. What do you think? Is it realistic to expect that institutional Web sites will be able to conform to WCAG 2.0 guidelines?

Twitter conversation from: [Topsy]

Posted in Accessibility | 3 Comments »

What Next for Libraries? Making Sense of the Future

Posted by Brian Kelly on 2 October 2012

Tomorrow I’ll be giving an invited talk on “What Next for Libraries? Making Sense of the Future” at the Emerging Technologies in Academic Libraries 2012 Conference (emtacl12) which is being held at the Norwegian University of Science and Technology University Library in Trondheim, Norway.

The slides for the talk are available on Slideshare and are embedded below. In addition an accompanying paper is available on Opus, the University of Bath repository, in MS Word and PDF formats.

Twitter conversation from: [Topsy]

Posted in Events | Tagged: , | 1 Comment »

Analysis of Google Search Traffic Patterns to Russell Group University Web Sites

Posted by Brian Kelly on 1 October 2012

Background

How can we ensure that the wide range of information provided on university Web sites can be easily found? One answer is quite simple: ensure that such resources are easily found using Google. After all, when people are looking for resources on the Web they will probably use Google.

But what patterns of usage for searches for university Web sites do we find? In a recent survey of the search engine rankings, it was observed that only one institutional Web site (at the University of Oxford) was featured in the list of Web sites which have a high ranking which can help drive traffic to the institutional repository. It was also noticed that this Web site had a significantly lower Alexa ranking (6,187) than the other 15 Web sites listed, such as WordPress.com, Blogspot.com, YouTube.com, etc. which had a Google ranking ranging from 1-256.

In order to gain a better understanding of how Google may rank search results for resources hosted on university Web sites are, the findings of a survey are published below which provide graphs of recent search engine traffic and summarise the range of values found for the global and UK Alexa rankings and the Alexa ‘reputation’ scores across this sector.

About Alexa

From Wikipedia we learn that:

Alexa Internet, Inc. is a California-based subsidiary company of Amazon.com that is known for its toolbar and website. Once installed, the Alexa toolbar collects data on browsing behavior and transmits it to the website, where it is stored and analyzed, forming the basis for the company’s web traffic reporting. Alexa provides traffic data, global rankings and other information on thousands of websites, and claims that 6 million people visit its website monthly.

The article goes on to describe how:

Alexa ranks sites based on tracking information of users of its Alexa Toolbar for Internet Explorer and Firefox and from their extension for Chrome. 

This means that the Alexa findings should be treated with caution:

the webpages viewed are only ranked amongst users who have these sidebars installed, and may be biased if a specific audience subgroup is reluctant to do this. Also, the ranking is based on three-month data

Despite such limitations, the Alexa service can prove useful in helping those involved in providing large-scale Web sites with a better understanding of the discoverability of their Web site.  The Alexa Web site describes howAlexa is the leading provider of free, global web metrics. Search Alexa to discover the most successful sites on the web by keyword, category, or country“.

In light of the popularity of the service and the fact that, despite being a commercial service, it provides open metrics it is being used in this survey as part of an ongoing process which aims to provide a better understanding of the discoverability of resources on institutional Web sites.

Survey Using Alexa

The following definitions of the information provided by Alexa were obtained from the Alexa Web site:

The Global Alexa Traffic Rank is “An estimate of the site’s popularity. The rank is calculated using a combination of average daily visitors to the site and pageviews on the site over the past 3 months. The site with the highest combination of visitors and pageviews is ranked.”

The GB Alexa Traffic Rank is “An estimate of the site’s popularity in a specific country. The rank by country is calculated using a combination of average daily visitors to the site and pageviews on the site from users from that country over the past month. The site with the highest combination of visitors and pageviews is ranked #1 in that country.

The Reputation is based on the number of inbound links to the site: The number of links to the site from sites visited by users in the Alexa traffic panel. Links that were not seen by users in the Alexa traffic panel are not counted. Multiple links from the same site are only counted once. 

The graph showing traffic from search engines gives the percentage of site visits from search engines.

The average traffic is based on the traffic over the last 30 days.

The data was collected on 20 September 2012 using the Alexa service. Note that the current finding can be obtained by following the link in the final column.

The graphs for the traffic from search engines contain a snapshot taken on 20 September 2012 together with the live findings provided by the Alexa service. The range of findings for the Alexa rank and reputation is provided beneath the table.

Table 1: Alexa Findings for Russell Group University Web Sites
1 2 3 4 5
Institution Traffic from Search Engines Average Traffic
(18 Aug -
17 Sep 2012)
View
Results
20 Sept 2012 Current findings
University of
Birmingham
19.7% [Link]
University of
Bristol
22.8% [Link]
University of
Cambridge
24.0% [Link]
Cardiff University   26.1% [Link]
University of
Durham
25.1% [Link]
University of
Edinburgh
26.9% [Link]
University of
Exeter
26.7% [Link]
University of
Glasgow
    -  [Link]
Imperial College     31.0% [Link]
King’s College
London
  19.9% [Link]
University of
Leeds
31.7% [Link]
University of
Liverpool
22.5% [Link]
LSE   22.5% [Link]
University of
Manchester
  25.8% [Link]
Newcastle
University
15.7% [Link]
University of
Nottingham
20.5% [Link]
University of Oxford 26.8% [Link]
Queen Mary,
University
of London
   20.2% [Link]
Queen’s
University
Belfast
  14.1% [Link]
University of
Sheffield
     17.4% [Link]
University of Southampton   21.9% [Link]
UCL     26.7% [Link]
University of
Warwick
  29.6% [Link]
University of
York
    23.5% [Link]

Survey Paradata

This survey was carried out using the Alexa service on Thursday 20 September. The Chrome browser running on a Windows 7 platform was used. The domain name used in the survey was taken  from the domain name provided on the Russell Group University Web site. The snapshot of the traffic shown in column 2 was captured on 20 September. Column 3 gives a live update of the findings from the Alexa service. Note that if the live update fails to work in the future this column will be deleted.

Summary

The Russell Group university Web sites have global Alexa rankings ranging from 6,318 to 75,000 and UK Alexa rankings ranging from 748 – 6,110. In comparison in the global rankings Facebook is ranked at number 1YouTube at 3Wikipedia at 6Twitter at 8Blogspot at 11WordPress.com at 22, and the BBC at 59.

The Russell Group university Web sites have “reputation” scores ranging from 4,183 – 43,917, which are based on the number of domains with links to the sites which have been followed in the past month. Although the algorithms used by Google to determine the search results ranking are a closely-kept secret (and are liable to change to prevent misuse) the number of domains, together with the ranking of the domains, are used by Google in its search algorithms for ranking the search results. According to the survey, Google delivered between 14-31% of the traffic to the Web sites during August-September 2012.

Caveat

In addition to the limitations of data provided by Alexa summarised above it should be noted that we should not expect institutions to seek to maximise any of the Alexa rankings purely for its own sake. We would not expect university Web sites to be as popular as global social media services. Similarly it would be unreasonable to expect findings  to be used in a league table. However universities may well be exploring SEO approaches, and perhaps commissioning SEO consultants to advise them. This post, therefore, aims to provide a factual summary of findings provided by a service which may be used for in-house analysis or by third-parties who have been commissioning to advise on SEO strategies for enhancing access to institutional resources.

Discussion

This survey was published in September since we might expect traffic to grow from a lull during the summer vacation, but increase as students prepare to arrive at university. It will be interesting to see how the pattern changes over time and, since this page contains a live feed from Alexa shown in column 7, it should be easy to compare the current patterns across the Russell Group universities.

This initial survey has been carried out in order to provide a benchmark for further work in this area and invite feedback. Further work is planned which will explore in more detail the Web sites which drive search engine traffic to institutional Web sites in order to identify strategies which might be used in order to enhance traffic search engine.

It should be noted that this data has been published in an open fashion in order that the methodology can be validated and the wider community can benefit from the findings and from open discussion about the approaches taken to the data collection and discussions on how such evidence might inform plans for enhancing the discoverability of content hosted on institutional Web sites. Feedback would be appreciated on these approaches.

Twitter conversation from: [Topsy]

Posted in Evidence, search | Leave a Comment »