UK Web Focus

Innovation and best practices for the Web

Archive for November, 2011

1,000 Posts On: Runner-Up In The IT Professional Blogger Award

Posted by Brian Kelly on 30 November 2011

This is the 1,000th blog post since the blog was launched 5 years ago, on 1st November 2006.  This anniversary therefore provides an ideal opportunity to announce the news that the UK Web Focus blog was the runner-up in the IT Professional Blogger of the Year category of the Computer Weekly Social Media Awards.

The winner of this category was Elizabeth Harrin for her blog A Girl’s Guide to Project Management.  As described on the About page on her blog Elizabeth also launched her blog in 2006. Looking at the frequency of her postings, Elizabeth is clearly passionate about her blog and reading her page on Earning Disclosure she takes an open and responsible approach in being honest with the readers of her blog.  Elizabeth is a well-deserved winner of this award and I was pleased to have the opportunity to chat with her briefly last night.

For those who are unfamiliar with the UK Web Focus blog it “functions as an open notebook which provides personal thoughts, reflections and observations on the role of the Web in higher and further education which I hope will inform readers and stimulate discussion and debate“.

Although the blog regularly addresses technical Web developments an additional important area covers the importance of openness, in a broad sense to support key institutional activities.  As well as writing papers in this area (such as the paper on Openness in Higher Education: Open Source, Open Standards, Open Access and Let’s Free IT Support Materials!) the blog also embraces such values: content published on this blog is available under a Creative Commons licence (which, during Open Access Week 2011 was changed from CC-BY-SA to CC-BY) and comments are open on all 1,000 blog posts which have been published.

The approaches taken in providing this blog seem to be widely appreciated as can be seen not only from the people who voted for the blog but also from comments I have received recently:

Your blog is an inspiration, long may it continue!

Well done by the way – I catch your bog in my rss reader and am flabberghasted that you can post so much (and all good) – I’m cheering for you.

I love your blog. You have a knack of finding the right subject and the right lessons from it. 

Your blog is an excellent way to keep myself informed about Web 2.0 and it’s good to have a HE perspective.

Many thanks for the comments and the votes :-)   And note that if you’d like to see what happened at the awards ceremony, Elizabeth Harrin’s blog post on “Thank you! I’m IT Professional Blogger of the Year” blog post contains a brief video clip.

Posted in Blog | 1 Comment »

Paradata for Online Surveys

Posted by Brian Kelly on 29 November 2011

In a recent post on “Surveying Russell Group University Use of Google Scholar Citations” I used Google Scholar Citation’s search facility to audit the numbers of researchers in the twenty Russell Group Universities who have claimed profiles on the service.

Looking at my own host institution, which is a member of the 1994 Group, at the time of writing there are 33 profiles for a search for the University of Bath but only 23 profiles for a search for the “University of Bath”.  We can see that the findings differ depending on the search syntax, such as whether the search term in enclosed in quotes or not.

There is therefore a need to be explicit about the way in which the searches are constructed in order to ensure that findings are reproducible.  In previous surveys I have tried to document the survey methodology in the text of the blog posts but it has occurs to me that the specific details may be overlooked.  I therefore feel that further surveys should include explicit details of the survey paradata, a term which is defined in Wikipedia as “data about the process by which the survey data were collected“.

The blog posts I have published have, wherever possible, provided live links to the services used to gather the data. Such links may provide parameters which may differ depending on factors such as the browser environment you are using,  The hyperlink used for the search described above, for example, is:

http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=%22University+of+Bath%22&after_author=sDUAAPv___8J&astart=20

As described in the Google documentation:

The hl parameter specifies the interface language (host language) of your user interface. To improve the performance and the quality of your search results, you are strongly encouraged to set this parameter explicitly.

This is a simple query. However the Google search box in my browser produces the following URL as a result of a search for google scholar citations:

http://www.google.co.uk/search?client=safari&rls=en&q=google+scholar+citations&ie=UTF-8&oe=UTF-8&redir_esc=&ei=6lXSTrP8FoOh8gOkoJ0I

In order to ensure that a rich description of the survey environment is available, my intention is that surveys published in future will contain survey paradata details along the lines illustrated in the following table, which describes the survey published in the recent post.

Details Description Data Note
Search term The official name of the host institution. Column 1 Name is not included in quotes.
Date The date of the survey. 24 November 2011 If the survey is carried out across several days, this should be documented.
Search service Google Scholar Citations service. http://scholar.google.com/citations If, for example, a UK version of the service is released, this should be documented.
Browser environment Name & version of browser and platform. Safari v 5.5.1 running on an Apple Macintosh Include details of browser plugins if this is felt to be relevant.
Language The default language (English) is used. EN
Search options Search options selected. Used the “Search Authors” option. If additional search options are available they should be documented.
Location Details of where the survey was carried out. Search carried out in Bath, UK.
User account Information on whether surveyer was logged in. Search carried out whilst logged in to Google.
Possible problem areas
  • There may be name clashes (e.g. University of Newcastle and University of Newcastle, New South Wales).
  • Searches may include email address fields as well as name of institution

Any suggestions on things I may have missed?

Posted in Evidence | Tagged: | 7 Comments »

Google Scholar Citations and Metadata Quality

Posted by Brian Kelly on 28 November 2011

Back in 2005 Debra Hiom, Amanda Closier and myself wrote a paper entitled “Gateway Standardization: A Quality Assurance Framework For Metadata” which was published in the Library Trends journal. The paper (which is available in MS Word and PDF format from the University of Bath repository) described the systematic approaches to ‘spring-cleaning’ metadata which the SOSIG subject gateway which, at the time, was a subject gateway in the Resource Discovery Network.  The approaches which were taken at SOSIG reflected a quality assurance framework which was being developed by the JISC-funded QA Focus project which was described in a paper on “Developing a quality culture for digital library programmes“.

The quality assurance approaches or metadata we described in the papers was focussed primarily on the service providers. However, six years later, the importance of the quality of metadata for resource discovery is no longer just of relevance to service providers. In a Web 2.0 environment in which content providers can make their teaching and learning and research outputs available on a wide range of services without the mediation of information professionals there is a need to ensure that a wider range of content providers are aware of risks that poor quality metadata can lead to valuable content being difficult to find.

I became aware of such risks while Surveying Russell Group University Use of Google Scholar Citations which I described in a recent blog post.  As mentioned in the post I became aware of the dangers of over-counting the numbers of researchers who have claimed a profile by aggregating researchers from the University of Birmingham with those from the University of Birmingham at Alabama or those from Newcastle University with Newcastle University,  New South Wales.

 Of further investigation I discovered entries from researchers who had misspelt the name of their university by using “univeristy” – a common typo which I myself have made. Currently it seems there are only 33 such misspellings.
In our paper we described how:

We have recommended to the JISC that those JISC-funded projects making significant use of metadata should address these issues as part of the project’s reporting procedures.

Whilst the issues referred to are still valid for projects which have significant metadata requirements, we now have the question of approaches which researchers can use when they are uploading information about their papers which may be harvested by a range of services, who aren’t in a position to implement metadata quality checking tools in services which may be used by full-time information management staff.

So what can individual researchers do to ensure that their papers don’t become difficult to find in tools such as Google Scholar Citations?

I have experimented with tools such as Collabgraph, a finalist in the Mendeley/PLoS API Binary Battle. This helped me to spot that a number of my papers listed in my Mendeley library had listed two sets of co-authors in a single string.  This brought home to me the potential benefits of visualisations for spotting errors in textual data.

In addition to use of such tools a recommendation I am making to colleagues is to create a profile and check you pages while the service is still new and there are only small numbers of users.  This means, for  example, that I can search for authors called “Kelly” and discover that there are currently only 26 entries and that there are no duplicate entries for me.

I can also search for my department, UKOLN, and check that the entries are correct.In this case we are fortunate in having a unique name for our department.  However in many other cases there may be legitimate variants: for example I currently find seven entries for Computer Science, Southampton and 43 entries for ECS, Southampton with the discrepancy due, in part, to many researchers having a foo@ecs.southampton.ac.uk email address.

As I started to reflect on ways in which errors could be introduced into such services and ways in which end user might search for resources I realised that although early adopters can gain benefits in adopting profiles in such services (by gaining additional exposure to one’s research and being able to more easily spot errors when there is only are small numbers of  profiles available) at some point the bottom-up approach will suffer from limitations. What we really need will be the centralised provision of quality assured metadata about research publications.  But services such as Google Citations Scholar won’t disappear in the short term (although, as with a range of other Google services, they could disappear in the future if they turn out not to be aligned with Google’s business interests).  My conclusions: be an early adopter in order to provide another mechanism for making ones research papers more visible but be prepared to accept the risk that the benefits may not last forever.

Posted in Evidence, Repositories | 1 Comment »

Surveying Russell Group University Use of Google Scholar Citations

Posted by Brian Kelly on 24 November 2011

Measuring Take-up of Google Scholar Citations

A recent post gave some “Thoughts on Google Scholar Citations“. I concluded by suggesting that researchers could find it useful to claim their account on Google Scholar Citations and  ensure that the details of their papers are accurate but speculated on whether there would be barriers to researchers doing this. In order to investigate the level of usage of Google Scholar Citations in the UK higher education sector a survey of its usage across the twenty Russell Group Universities has been carried out and the findings published in this post. The institution’s name, as listed in the first column, was used as a search term.  The number of entries gives the current number of researchers found, with a link provided to the current final page of results.  In addition in order to investigate whether the service is being used by new researchers, who are likely to have a low number of citations or well-established researchers with large numbers of citations, a summary of the top three researchers having the largest numbers of citations is give, with links to the researchers profile together with details of the numbers of citations for the three researchers having the lowest numbers of citations. The results are given in the following table.  The survey was carried out on Tuesday 22 November 2011

Institution Nos. of entries Highest Citations Lowest Citations
University of Birmingham    33 *  (18,989)* – 5,817 –  5,7704,243  13 – 15 – 16
University of Bristol 40   21,761 –  9,223  –   8,271   0  –  0  –  6
University of Cambridge 73   46,12118,272 –  17,806   0  –  0  –  0
Cardiff University 20    6,665 –   6,142  –   3,823   0  –  0  –  1
University of Edinburgh 68   13,844 – 12,158  –   9,082   0  –  0  –  0
University of Glasgow 64   13,22811,718  –   5,773   0  –  0  –  1
Imperial College 71   31,261 –   9,630  –   9,303   0  –  4  –  4
Kings College London 23     6,052  – 6,030  –    4,513   0  –  0  –  0
University of Leeds 30   12,686 –  6,780  –    6,732   0  –  1  –  4
University of Liverpool 15   34,49920,014  –  14,717   1  –  1  –  8
London School of Economics 17   14,191 –  9,222  –    6,303   0  –  0  –  0
University of Manchester 73   19,57218,155 –  13,708   1  –  1  –  2
Newcastle University    44 *   11,18510,679  –   3,111   0  –  1  –  4
University of Nottingham  40   11,506 –   9,084  –   5,661   0  –  0  –  0
University of Oxford 109   25,36324,311 –  16,639   0  –  0  –  0
Queen’s University Belfast  15    2,357  –  1,913  –   1,667   1 – 24 – 37
University of Sheffield  32    5,735  –  3,318  –   2,980   0  –  1 –   1
University of Southampton  39   42,197  –  9,009  –  4,708   0  –  0  –  4
University College London 145   31,44030,842 –  20,058   0  –  0  –  0
University of Warwick  23     3,194 –  2,923  –   1,850   0  –  0  –  0
Total      974 * **

* It was noted that the first entry for a search for the University of Birmingham referred to Mary Vignolo Wheatley from the University of Alabama at Birmingham. The numbers of Google Scholar Citation entries is therefore overstated for the University of Birmingham and potentially for the other institutions which are listed. ** I was informed after publication of this post that of the 44 citations quoted for Newcastle, 11 are actually for the University of Newcastle, NSW, Australia. Such errors could creep in for other institutions for which there are name clashes (e.g. York University and New York University). This highlights the need for globally unique institutional identifiers – but such discussions are outs the scope of this post. It was also noticed that the third entry for the University of Cambridge referred to Alan Turing, the English mathematician, logician, cryptanalyst, and computer scientist who, as described in Wikipedia, lived from 1912-1954.  Unsurprisingly his Google Scholar Citation entry states that his email address has not been verified!

Discussion

In a recent discussion about Google Scholar Citations I have been told about the difficulties in claiming authorship of papers after one has left one’s host institution and no longer has an institutional email address.  A second discussion I heard from one person who claimed his Google Scholar  account shortly before leaving his host institution who provided an alternative email account which could be used one his institutional email account had been deleted. The first example highlights a potential difficulty in asserting authorship of papers after one has left the host institution and the second example describes one way in which such potential problems can be addressed.  It would therefore appear sensible for researchers to claim a Google Scholar account while they are in a position to associate it with papers published in their host institution. An interesting issue, therefore, will be who should take responsibility for advising researchers on best practices for using services such as Google Scholar Citations.  Should the library include such advice in its training courses for new researchers?

Conclusions

A recent post by Wouter Gerritsma, subject librarian and bibliometrician at Wageningen UR Library described “How Google Scholar Citations passes the competition left and right“. Wouter’s post concluded:

Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

It will be interesting to see how this story develops.  And as the launch of Google Scholar Citations was only announced a week ago today, we do have an opportunity to observe its take-up within our institutions from its early days.  Monitoring the take-up of the service, the approaches taken in managing the information and understanding difficulties in such management activities will be valuable not only in developing plans for use with other services in this space. Hmm, I wonder if Google Scholar Citations has APIs which will enable such monitoring approaches to be implemented in a scalable way?

Posted in Evidence, Repositories | 15 Comments »

What Is Your Blog Community Talking About?

Posted by Brian Kelly on 23 November 2011

The Need for Better Blog Search

Quite a while ago I became somewhat frustrated with the limitations of WordPress.com’s search facility for searching this blog. I had hoped that there would be a Google search tool which could replace the search box at the top right of this blog’s Web site, but the limitations on the HTML code which can be included in blog widgets meant that this wasn’t an option. However whilst searching for alternatives I came across the Lijit search tool. Since I am not able to provide a search box for this tool it is instead provided as a link under the WordPress search box – and is probably little used. However in addition to providing a standard search for content posted on this blog  its key strength, for me, is its ability to search across my blogging community.

If, for example, I search for RDFa I find a conventional set of links to posts I have published about RDFa. But if I click on the Network tab I find details of posts published by contacts in my blog network, as illustrated.

Using a search for HTML5 I found that Anthony Leonard has published an interesting post on Fixing academic literature with HTML5 and the semantic web.

A search for “schema.org” reveals that Peter Sefton and the UK Access Management team have written several posts on this topic.

Similarly a search for “JISC” finds posts published by my networks on ‘JISC’ which might be of interest for those working in JISC, especially those with an interest in what people are saying about the organisation.

One of the interests I had in better searching was to be able to spot spam comments which I had failed to delete. A search for Viagra found only a legitimate post on “Dodgy Blog Link Spam“. How searching across my network for this term I found one blog which contained a large number of spam comments (I have informed the blog owner so hopefully the spam will be deleted shortly).

How Does It Work?

Initially I had thought that the Network search was based on harvesting blogs of people who have commented on my blog. However the FAQ states that

The Network tab contains all of the results found from the sites automatically detected from your blogroll, and any other site you’ve manually setup via the ‘Network’ section of Lijit.com.

This is somewhat strange as I know longer publish a blogroll. However use of Lijit did make me realise that the people who have commented on my blog (which, looking at the WordPress administrator’s interface, I find includes Christopher Gutteridge, Andy Powell, Chris Rusbridge, Les Carr, Anthony Leonard and Martin Hawksey) are probably people whose posts I am likely to find of interest – after all, if they are motivated to comment on my posts we will probably have shared interests.

As an experiment I have therefore revived the blogroll on this blog and populated it with the blogs provided by those listed above together with other bloggers whose content I find particularly interesting and relevant to my interests. I hope that this will mean that when I’m search this blog for things I have written about in the past that I’ll be able to see what my blogging peers have said on the same topic. And although this may be regarded as an ‘echo chamber‘ for me this provides valuable personalised searching.

I should add that I removed the blogroll several years ago in order to try to minimise clutter in the blog’s sidebar, so I’m not convinced that having a long list of blogs is my blogroll is desirable for this blog. But I do wonder what such an approach might be particular useful for project blogs, with blogrolls for all blogs provided for a particular programme helping to both help end users with an interest in the programme are to find other projects as well as providing a search facility across the blogs. It may be, of course, that others will have developed a more elegant solution for searching across a blog community, in which case I’ve welcome links to such approaches.

Posted in Blog | 1 Comment »

Thoughts on Google Scholar Citations

Posted by Brian Kelly on 22 November 2011

Citation Analysis Services

I recently wrote a post entitled “Will the Real Scott Wilson Please Stand Up, Please Stand Up” in which I described my initial experiences with the Microsoft Academic Search service.  I have to admit that I was impressed by the user interface and how, for example, it depicted links with my co-authors.

Revisiting Microsoft Academic Search

The main limitation with the Microsoft Academic Search service was, I felt, the accuracy of the data and the need to get author buy-in in order that authors could claim their own papers and remove papers incorrectly attributed to them.  The information it has about me, for example, suggests that I have published 56 papers, including one dating back to 1979. In fact it should know about 30 of my papers, the earliest of which was published in 1994.

Several weeks ago I edited my publications list to remove papers written by other Brian Kellys.  These edits have been accepted and when I sign in I get confirmation of the 38 papers I have confirmed authorship of and the 18 which have been removed from the list. However the wiki-style approach to editing the content means that edits have to be confirmed and this does not appear to have happened.  I therefore appear to be claiming more publications that is the case and, possibly, the citation statistics (G-Index=11 and H-Index=6) for my papers may be inaccurately calculated.

Google Scholar Citations

Whenever I come across a new service which appears to provide value I am also interested in seeing if there are alternative offerings. In part this is to ensure that I don’t find myself being locked into a single vendor. But in addition it can also help to see how other providers address the same area. As the Microsoft Academic Search service is based on harvesting metadata about papers hosted on institutional repositories, publishers Web sites and similar resources we should expect to see similar competing services.  I was therefore pleased when I received an email last week which announced that the Google Scholar Citations service, which I had signed up to during the beta testing, had been opened as a public service.

A post was published on the Google Scholar blog on Wednesday 16 November 2011 entitled “Google Scholar Citations Open To All‘ which described how:

You can quickly identify which articles are yours, by selecting one or more groups of articles that are computed statistically. Then, we collect citations to your articles, graph them over time, and compute your citation metrics – the widely used h-index; the i-10 index, which is simply the number of articles with at least ten citations; and, of course, the total number of citations to your articles. Each metric is computed over all citations and also over citations in articles published in the last five years.

My Google Scholar Citations page is illustrated below. In comparison with my Microsoft Academic Search page this page appears somewhat limited in its functionality. It also has much less social connectivity, with links to only six of my co-authors who have registered for the service.

In addition to differences in the user interface and the social connections, Google Scholar Citations also has differences in the papers it has analysed and the corresponding citation indices, giving a H-index of 11 (in comparison with Microsoft Academic Search’s H-index of 6). Google Citations also provides a I10-Index score of 12 whereas Microsoft Academic Search provides G-Index score of 11.

Google Scholar Citations’ analysis of the papers indexed by Google Scholar seems to be based on a more accurate representation of my papers, possibly because I verified my papers some time ago.  Google Scholar also includes a number of popular articles I wrote which haven’t been deposited in the University of Bath repository and therefore don’t seem to have been indexed by Microsoft Academic Search, such as the Ariadne article on “An accessibility analysis of UK university entry points” for which there have been 28 citations. But in addition a paper on “Using networked technologies to support conferences”  delivered at the EUNIS 2005 conference which has been deposited in the in the University of Bath repository has been indexed by Google Scholar but not by  Microsoft Academic Search.

Whilst investigating Google Citations I came across a tweet from Les Carr who provided a link to his Google Citations page, which is illustrated below (which brought to my attention the paper on “Earlier web usage statistics as predictors of later citation impact” from 2006 which will be worth reading in light of Social Web developments since the paper was published in 2006).

Carr

In order to make some further comparisons between the coverage and citation analyses of Google Citations and Microsoft Academic Search I’ve summarised details for Les Carr together with the co-authors of my papers who have registered with Google Scholar Citations in the following table.

Name Microsoft
Academic

Search (MAS)
Google
Citations
registered on(GC)
Nos. of
publications (MAS)
Nos. of
publications (GC)
Nos. of
citations (MAS)
Nos. of
citations (GC)
G-Index (MAS) I10-Index (GC) H-Index (MAS) H-Index (GC)
Brian Kelly Link  Link  56  83  153 498 11 12  6 11
David Sloan Link Link  42  67  204 615 13 12  7 12
Jane Seale Link Link   6  85    49  714   6 14  4 12
Helen Petrie Link Link 106 172  569 1,397  22 34 15 18
Lorcan Dempsey Link Link  10 110    29 1,139   5 30  1 19
Alastair Dunning Link Link   3  13    8   29   2   1  2   3
Les Carr Link Link 169 206 1,158 1,558  28 42 17  21

It should be noted that:

  • The Microsoft Academic Search entry for Jane Seale has her affiliation listed as the University of Southampton. She is now based at the University of Plymouth so her citation statistics may be split across two entries.
  • There are two Microsoft Academic Search entries for Lorcan Dempsey: entry 1  and entry 2.
  • here are two Microsoft Academic Search entries for Alastair Dunning: entry 1  and entry 2.

Discussion

I’m pleased that Google have provided an alternative to Microsoft for providing details of citations for research publications (there are similar services, of course, but I thought it would be worth focusing this post on a newly released service and provide comparisons with a service I described recently).

Microsoft Academic Search seems to have taken an approach of indexing as many research papers as it can find, associating the papers with author and institutions. The Microsoft Academic Search  entry point currently states that it provides access to “6,684,802 publications and 18,831,151 authors, 5,472 updated last week“.  Papers are automatically assigned to organisations, with the details for the University of Bath providing the following information: Publications: 29,331; Citation Count: 131,732; H-Index: 96 and 1,638 authors. In addition papers may also be assigned to departments with the details for Bath/UKOLN providing the following information: Publications: 262; Citation Count: 932; H-Index: 15 and 245 authors.

The problem with such automated processing is that the data can be flawed with.  In contract the Google Scholar Citations requires users to opt-in before their papers are assigned to their Google account.  This means, for example, that Google Scholar Citations currently has details for only 18 authors from the University of Bath.

It seems to me that rather than the functionality of the services I’ve described, the main challenges will be getting buy-in from the authors’ whose papers have been indexed.  They will be both a significant user community for such services as well as possibly having responsibility for cleaning up the data.

Some questions which came to mind when I was looking at these services:

  • What is being indexed?  The Microsoft Academic Search service seems to have indexed primarily my peer-reviewed papers which I have deposited in the University institutional repository and from publishers’ databases. The Google Scholar Citation service, in contrast, seems to have also included papers from the UKOLN Web site which I wouldn’t have classed as ‘papers’.  I have removed papers which don’t fit in with my view of what should be included, but I appreciated that such definitions are likely to be very subjective.
  • Motivation to manage one’s content. What is the motivation to manage one’s content?  Since the automated harvesting and assignment of papers is liable to lead to errors, there will be a need for the data to be cleansed.  But what are the motivating factors for authors to do this?
  • Barriers to the management of one’s content.  Although authors may have motivating factors, such as ensuring that popular services provide an accurate view of their research publications, there may also be barriers to updating one’s data.  This might include the user interfaces provided by the services, the turnaround time for changes to be approved and the requirements for a Windows Live ID (in the case of Microsoft Academic Search) or a Google ID (in the case of Google Scholar Citations).

I recently came across a tweet from Guus van Brekkel (@digcmd) who described:

How Google Scholar Citations passes the competition left and right at WoW! Wouter on the Web bit.ly/uw8ppc

The tweet introduced me to the WoW!ter blog, written  by Wouter Gerritsma, subject librarian and bibliometrician at Wageningen UR Library. In the post Wouter gave his thoughts on the service:

 Google Scholar Citations really excels at finding publications you completely forgot about. 

and went on to make comparisons with other alternatives:

Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.

although he admitted that:

there are disadvantages to Google Scholar as well. The most serious at this moment all kind of ghost citations.

Wouter concluded:

Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

Perhaps the risk of failing to engage with the service and update the information which Google has will turn out to be the motivating factor for updating the content.  I’ve updated my content and started to email my co-authors so that they are listed. Have you updated your papers?  And if not, I’d be interested to know the reasons why not.


Twitter conversation from Topsy: [View]

Posted in Evidence | 14 Comments »

UK Web Focus Blog Short-listed for Social Media Award

Posted by Brian Kelly on 16 November 2011

The Computer Weekly Social Media Awards

I’m pleased to say that the UK Web Focus blog has been short-listed for the Computer Weekly Social Media Awards. The blog has been nominated for the IT Professional blogger of the year category which is “for blogs that detail an individual perspective, not a company line, of life in the IT profession“. There are eleven blogs nominated for this category:

All of these blogs, which are summarised on the Computer Weekly Web site, seem to be provided by IT professionals who care  about their work and are willing to share their thoughts, opinions and convictions with others. So why not use your opportunity to vote in these awards.  If you’d like to vote for a blog provided for the higher education sector, this blog might be the obvious one to vote for :-).

About The UK Web Focus Blog

If you haven’t come across this blog before I’ll provide a brief summary about the blog.

  • The blog was launched just over 5 years ago, on 1 November 2006.
  • Since the blog was launched there have been 991 posts published, an average of 3.8 posts per week (which includes a number of guest posts).
  • The blog author is Brian Kelly, who works for UKOLN’s Innovation Support Centre based at the University of Bath.  The UKOLN ISC is funded by the JISC and helps to support innovation within the UK’s higher and further education sector.
  • The blog addresses Web innovations and related ways in which networked services can be exploited across the sector.
  • In addition to covering Web developments, another important aspect of this blog is the commitment to openness as a way of helping embed innovation and best practices.
  • Blog posts are available under a Creative Commons licence – and slides hosted on Slideshare are also available under a similar licence.
  • In addition to publishing on this blog, Brian has also written over 50 peer-reviewed papers. Since the blog was launched many of the ideas, related to areas such as Web accessibility and Web preservation, have initially been published on this blog, encouraging feedback on the ideas before they are published in a peer-reviewed journal.
  • Writing so many posts means that errors are sometimes published. Blogs posts may well contain typos – the most embarrassing was probably the time I write about a “pee-reviewed papers“! But in addition to such typos, there may also be factual errors. But since all posts are open to comments, factual errors can be reported and posts corrected.
  • Surveys which have sought readers’ feedback on the blog have been published most years, such as this summary of an Analysis of the 2010 Survey of UK Web Focus Blog.

If you’ve found content published on this blog of interest I hope you will consider voting for this blog.  If the blog wins the award I will use this as an opportunity to promote the core values which underpin  many of the posts which I’ve published:  a combination of technical innovation and openness can help to enhance teaching and learning and research across the higher & further education sector.

One again, here is the link to the voting form. Please consider voting, it only takes a few seconds to check my name and it could make all the difference… Voting closes on 25 November, please vote now!

Posted in Blog | Tagged: | 2 Comments »

Remote Participants Invited to Seminar on “The Benefits of Amplified Events”

Posted by Brian Kelly on 15 November 2011

On Thursday 17th November 2011 my colleague Marieke Guy  is giving a talk on “The Benefits of Amplified Events” as part of the University of Bath’s Green Impact seminar series.  There will be a live stream of the talk which is being provided by my colleagues Julian Prior and Marie Salter from the eDevelopment team in the Division for Lifelong Learning.

Marieke will be explaining the benefits of amplified events, including ways in which amplified events can help to maximise the impact of ideas presented at an event  and provide access to people who are unable to physically attend. One additional important area, which is being addressed in our participation in the JISC-funded Green Events II project,  is the environmental impact of events. Clearly avoiding the necessity to travel can provide environmental benefits, and I’m pleased that there has been participation from Spain, Denmark, the US, Canada and Australia at amplified events hosted by UKOLN.

But what of the environmental costs of the video streaming itself?  We would like to explore these issues by encouraging remote participants to record details of the bandwidth used in viewing a live video stream of forthcoming amplified events.

Thursday’s seminar will be streamed using the University of Bath’s Adobe Connect service which can host up to 20 participants.  If you wish to view the live video stream please register on the EventBrite system. In addition we would like to invite people to give their feedback on the experience and, if possible, to provide statistics on the bandwidth usage. Ideally, ideally it would be useful if remote participants could run simple network tests such as ‘traceroute’ or possibly use the Firebug plugin for FireFox (which tracks data volumes and provides information on  the IP addresses and domains used) together with the NetExport extension to save a log called NetExport (which adds the ability to Firebug to ‘export’ the HAR file to your hard drive). If. however, you are not able to install these tools but have an interest in this topic, feel free to sign up – although we’ll be asking you to describe your experiences, including any problems, which will help us to improve our amplification services and advise others on best practices.


Note (added on 16 Nov 2011): If you wish to take part in the exercise of monitoring network traffic for watching the video stream, once you have installed Firebug and the NetExport extension to Firebug you should use the following steps:

  1. Switch Firebug on for all pages using Firebug icon in top right corner of Firefox. Click the down arrow and choose ‘On for all pages’.
  2. Click on the Net tab at the top of the Firebug pane that appears and that should bring up a new menu directly below it. That will probably be on ‘All’ (greyed out) if not click on ‘All’.
  3. Navigate to the relevant Web page. Firebug will start logging all the connections/downloads on that page (don’t navigate away from that page in that tab) it will continue to log all activity from that page.
  4. When you want to save the log file clink on the ‘Export’ button on the lower Firebug menu – choose ‘Save as’ and save the ‘.har’ to disk, from where it can be e-mailed to the event organisers!

Note, however, that it is currently unclear as to whether this technique will work with the Adobe Connect interface.

Posted in Events | Tagged: | 1 Comment »

To What Extent Do Multiple Copies of Papers Affect Download Statistics?

Posted by Brian Kelly on 14 November 2011

Are Multiple Copies of Papers Bad For The Researcher?

If authors are encouraged to provide pre-prints of their papers in addition to the paper which is hosted at the publisher’s Web site, how might that affect the associated usage statistics?  If usage statistics are fragmented, how easy might it be to aggregate the statistics? And if doing this is difficult, does it matter?

This was a question I was asked recently.  In order to try and gain a better understanding of what the issues were I have analysed the usage statistics for the five most downloaded papers which I have uploaded to Opus, the University of Bath institutional repository.   This exercise helped me to understand that the issues is more complicated than I initially appreciated.  The data for my papers is summarised below.


Paper 1
Library 2.0: balancing the risks and benefits to maximise the dividends
Journal/Event Program Electronic Library & Information Systems, 43, 2009
Opus statistics 1,516
UKOLN Web site statistics 190 consisting of 14 (.doc files viewed in 2011) +129 (.doc files viewed in 2009) + 47 HTML file viewed in 2009)
Publisher’s information [Paper] – Usage statistics not available
Nos. of citations 8 Citations according to Google scholar
Other known copies There are 210 records listed in Google Scholar search which includes links to versions on Opus and the UKOLN Web site
Notes Two versions of paper published: Initial paper presented at Building Bridges 2009 conference. Paper subsequently republished in Program.
Paper 2
From Web Accessibility to Web Adaptability
Journal/Event Disability and Rehability: Assistive Technology, 4, 2009
Opus statistics 491
UKOLN Web site statistics 0 views
Publisher’s information [Paper] – Usage statistics not available
Nos. of citations 6 citations according to Google scholar
Other known copies David Sloan’s list of publications (PDF file available). There are 10 records listed on Google Scholar search which includes links to versions on Opus and the UKOLN Web site.
Notes This paper was embargoed and so was not released until 18 months after publication.
Paper 3
Implementing a Holistic Approach to E-Learning Accessibility
Journal/Event ALT-C, 2005
Opus statistics 409
UKOLN Web site statistics 4,021 views consisting of 295 (HTML views in 2011) + 557 (HTML views in 2010) + 592 (HTML views in 2009) + 1,009 (HTML views in 2008) +861 (HTML views in 2007) + 707 (HTML views in 2006) + 635 (HTML views in 2005)
Publisher’s information Not available on conference web site
Nos. of citations 20 citations according to Google scholar
Other known copies There are 8 records listed on Google Scholar search which includes links to versions on Opus and the UKOLN Web site together with a copy of the MS Word file hosted by MediaLT organisation in Norway.
Notes This paper was awarded the prize for Best Research Paper at the ALT-C 2005 conference.
Paper 4
Developing A Holistic Approach For E-Learning Accessibility
Journal/Event Canadian Journal of Learning and Technology, 30 (3), 2004
Opus statistics 404
UKOLN Web site statistics 498 views consisting of 188 (HTML views in 2011) + 310 (HTML views in 2010)
Publisher’s information [Publisher's copy] – Usage statistics not available
Nos. of citations 36 citations according to Google scholar
Other known copies There are 6 records listed on a Google Scholar search which includes links to metadata records on Opus and the UKOLN Web site.
Notes This paper was available on the UKOLN Web site for a significant period of time.
Paper 5
Empowering users and their institutions: A risks and opportunities framework for exploiting the potential of the social web
Journal/Event CULTURAL HERITAGE online conference web site, 2009
Opus statistics 356
UKOLN Web site statistics 0 views
Publisher’s information [Publisher's copy] – Usage statistics not available
Nos. of citations 1 citation according to Google scholar
Other known copies There are 3 records listed on a Google Scholar search (which has one link to a copy on the UKOLN Web site) and 12 on a second Google Scholar search which includes links copies on the conference Web site.
Notes This paper was not made available on the UKOLN Web site. The publisher’s copy consists of two large PDF file of all papers presented at the conference.  Also note that this was a recent paper, by which time it had been decided to only publicise the copy on the institutional repository.

In total there have been 3,176 views of these five papers from the institutional repository and 4,709 views from the UKOLN Web site. Reviewing this evidence it seems that copies which were provided on the UKOLN Web site in 2004 and 2005 have had significant numbers of downloads from the Web site, in excess, significantly in one case, the numbers of downloads in the Opus repository.

It should also be noted that, as described in a blog post entitled Scridb Seems to be Successful in Enhancing Access to Papers papers hosted on the Scribd document sharing service do seem to attract a very large number of downloads, as shown below.

Discussion

If download statistics are used to complement citation statistics in order to provide some indication of the value of research publications it would appear that there will be pressures to either ensure that content is hosted only in a single location of that download statistics from multiple repositories can be aggregated.

However it does not seem clear how one might aggregate usage statistics from a diversity of services. I have been able to publish the statistics for files hosted on the UKOLN repository as I have access to the usage statistics, but this is clearly not a scalable solution.  SImilarly for the papers I have described I have not been able to find any statistics for the copy hosted on the publisher’s site.

One might then conclude that the recommendation should be that research papers should only be hosted in a single location. But is this a realistic approach?  I have always been keen on maximising access to my papers. Initially this was done by hosting the papers on the UKOLN Web site, before the University of Bath provided an institutional repository.  Although the papers are now hosted on the repository, and this is now the preferred location, I am reluctant to delete the original copy since this may cause long-established links to the paper to break and thus  cause access problems for users following such links. Similarly I would be reluctant to stop co-authors hosting a copy of the paper on their own repository. Indeed, since I seek to make use of Creative Commons licences to encourage reuse where possible it would seem to go against the grain to try to control such reuse in order simply to enhance metrics.

This, it seems to me, is the crux of the matter.  If the aim of research papers is to have an impact and open access can enhance this goal, then surely we need to accept the fragmentation of resources, including research publications. Looking at the metrics for the papers listed above it does seem that where a paper is available from multiple locations this enhances the numbers of downloads and subsequent citations although I would welcome a more rigourous analysis.

However such speculations are based on a very small sample and very subjective opinions. In addition the analysis of the usage statistics for the UKOLN Web site seems surprising, with figures displayed primarily for the HTML versions of papers and not the MS Word and PDF versions. This may be due to the usage statistics package not displaying findings for resources for which here have only been a small number of downloads.   However if this is the case it seems to suggest the advantages of providing a research paper in HTML format as well as MS Word and PDF.

But how typical are these findings, I wonder?  And what do people think about the tensions between maximising access to papers by setting them free and being able to better understand their usage by providing papers in a more managed environment?

Posted in Evidence | 3 Comments »

Google Street View Arrives on University Campuses

Posted by Brian Kelly on 9 November 2011

Google Street View

A few days ago Martin Hamilton, Head of Internet Services at Loughborough University, tweeted:

Google StreetView of @lborouniversity now available – http://goo.gl/vcpds /cc @LufbraPresident

and followed this with:

My favourite StreetView image of Loughborough campus http://goo.gl/Ex7gf (yes, you can cross the bridge :-)

I read this tweet on my Android phone and followed the link – and was impressed. Whilst many of us will probably have explored Google Street View on a desktop PC, I suspect I’m not alone in not having used it on a mobile phone. I found that rather than having to move the mouse to orientate myself I simply positioned that phone in the direction I was interested in.

Using Google Street View on an Android Phone

On the same day that Martin Hamilton was tweeting the news, Mike Nolan, head of Web Services at Edge Hill University published a post which also announced the news that Street View Live!, in his case at Edge Hill University. As illustrated, Mike’s blog post included an embedded live Street View (although note that I can’t embed the Street View in this WordPress.com blog).

As Mike described in a post he published a year ago in 2010 Edge Hill University was “visited by the Google Street View trike to take imagery of the Ormskirk campus. Unlike roads which are photographed using a car, private property like university campuses and Disneyland Paris are photographed using a trike allowing them to get along footpaths”.

Since I used to work at Loughborough University I decided to explore the Street View for the Loughborough University campus. I found that the experience on my HTC Sensation Android phone was far superior to use of my iPod Touch. I’ll therefore describe my experiences of using my Android phone – and make the observation that when comparing experiences with others there will be a need to understand how different devices will provide different functionality.

As Martin said in his tweet, you can cross the footbridge. I therefore turned around so that I was facing the bridge and dragged the icon so that I walked across the bridge into the car park. I then turned back so that I could see the bridge I had walked across. I have to admit it seems slightly strange doing this while in somebody’s house! But having walked across the bridge I wanted to explore areas of the campus I was familiar with.

I recognised various buildings but I became distracted from visiting the Hazelgrave Building where I worked for six years. Instead I was intrigued by the people I saw and wondered if I would recognise anyone.

Google do obscure facial views on Street Views but I suspect that it would probably be possible to sometimes recognise people from a side view. In addition, as can be seen, it is probably possible to recognise people from their clothing and appearance even if the face is blurred.

Discussion

How many other universities have Google Street Views available for their campuses, I wonder? I also wonder how institutions will be addressing the privacy implications. The Google Maps Street Views Privacy page states that “Street View contains imagery from public roads, which is no different from what you might see driving or walking down the street. Imagery of this kind is available in a wide variety of formats for cities all around the world” and goes on to add that “Google will partner with an organisation such as Disneyland Paris to schedule imagery collection of their property“. I imagine that such partnership arrangements will also cover the digitisation of University campuses.

Google go on to describe how they “have developed cutting-edge face and licence plate blurring technology that is applied to all Street View images. This means that if one of our images contains an identifiable face (for example, that of a passer-by on the pavement) or an identifiable licence plate, our technology will blur it automatically, meaning that the individual or the vehicle cannot be identified.” Google also “provide easily accessible tools allowing users to request further blurring of any image that features the user, their family, their car or their home. In addition to the automatic blurring of faces and licence plates, we will blur the entire car, house or person when a user makes this request for additional blurring.

Rather than seeking permission before publishing such images Google are relying on a technical solution for blurring images and allowing users to choose to opt-out. This has some parallel with the “seek forgiveness, not permission” meme from a few years ago, which encouraging early adopters to deploy social media services such as blog even if they hadn’t received official sanction.

In previous discussions about privacy issues and social media services we, in the higher education sector, have been responding to issues which are relevant to society in general. This has previously been the case for Google Street Views but now when Google is partnering with organisations such as universities in order to be given permission to take photographs on private property, the situation is different.

I personally welcome developments such as Google Street Views and am pleased to see that it is becoming available across university campuses. as we know from the Wikipedia entry on Google Street View privacy concerns there have been examples of a man leaving a sex shop, a man vomiting and another man being arrested. But I feel that such concerns can be addressed by policy decisions (such as not taking photos late at night when students might be leaving the Union bar) and management of the content, including automated blurring of content and the provision of a “Complain about this image” facility. It also seems to me that it would be useful to seek to engage students in this process, as part of an institution’s digital literacy work.

But I am aware that these are the views of a white middle class and technically literate member of the higher education sector. We will have people on campus from a diverse range of backgrounds and cultural norms. How should we widen the debate on use of tools such as Google Street Views across our campuses?


Twitter conversation from Topsy: [View]

Posted in Mobile | 3 Comments »

Signals From Sheffield

Posted by Brian Kelly on 7 November 2011

What are IT Service departments doing these days? Frustrated users sometimes regard IT Services as seemingly having responsibilities for developing barriers to use of IT , with comedy sketches such as “Computer Says No” from Little Britain and Channel 4’s The IT Crowd illustrating that such views are commonplace. A few month’s ago as described on the Communities and Government Web site Local Government Minister Grant Shapps and Decentralisation Minister Greg Clark “called on a new generation of councillors to shake up their town halls in the interests of the people they serve and help banish the ‘computer says no’ culture that exists in some councils“.

Do University IT Service departments also need shaking up? A few days ago I came cross a post on Social Media in CiCS on Chris Sexton’s From a Distance blog in which she described use of social media within CiCS, the Corporate Information and Computing Services department. Chris, the CiCS director, explained:

Blogging is something some individual members of the department do. Some, like me, use commercial products like Blogger or WordPress and have them hosted off-site, some use our in-house blogging software, uSpace, based on a Jive product. Some blog regularly, some less often. What we haven’t had before is a departmental blog, so we’ve changed what used to be a static news page on our web pages into a blog. So much better – it’s easy to update, we can include pictures, links and videos, and, more importantly we can collect feedback in the comments field.

and went on to add that the department has “been using Twitter in the department for a few years” and has “finally taken the plunge and set up a [Facebook] page“.

In addition to running an IT Services department for a large Russell Group University Chris has been a prolific blogger since she set up the blog in October 2007, having posted 62 posts in the first three months of the blog, 208 posts in 2008, 183 in 2009, 162 in 2010 and 147 to date this year. It does seem to me that Chris’s blog will provide a good insight into IT departments in a large University so I hope that the content of the blog, which is hosted on Google’s Blogspot service, will be preserved. But it also seems to me that the sector would benefit if such openness and transparency were to be the norm across not only IT Service departments but also other service departments including the Library. So whilst Chris’s recent post demonstrates a commitment to use of social media to support the user community at Sheffield University, and a willingness to exploit both in-house and cloud services, perhaps the most important signal being sent from Sheffield University is the willingness to be open and invite comments and feedback on development plans. I’d be interesting in hearing if there are other IT Service departments which have taken a similar approach.

Posted in Web2.0 | 3 Comments »

How People Find This Blog, Five Years On

Posted by Brian Kelly on 1 November 2011

Summary of Blog Usage

Today sees the fifth anniversary of the launch of the UK Web Focus blog which took place on 1 November 2006. A year after the launch I provided  a review of  The First Year Of The UK Web Focus Blog and on 1 November 2009 reviewed The Third Anniversary of the UK Web Focus Blog. A year later I published a post on the Fourth Anniversary of this Blog – Feedback Invited.

In those reflective posts I asked in 2008 whether “on reflecting on the various feedback I’ve received, it seems to me that I’ll need to give some thought to perhaps creating a new blog” – in the end, although I contributed to several project and event-focussed blogs, I published posts primarily on this blog. In 2009 I commented that  “with over 600 posts published on the UK Web Focus blog, I can’t recall all of the things I have written about!“. The following year I described h0w “the blog [is] my open notebook [used] to keep a record of activities I had been involved in and my observations and thoughts on developments“.

This year I’ll again provide a snapshot of the statistics for the blog.  There have been 988 posts published and 4,610 comments (which, I should add, includes referrer links). There have been 377,300+ views, with an average of 205 views per day over the five years. The busiest day was 14 January 2011 when there were 1,420 views following the publication of a post on Institutional Use of Twitter by Russell Group Universities.

The numbers of daily views peaked in 2009 with an average of 247 views per day. Last year there were 243 views per day and so far in 2011 there have been 230 views per day.  This slight decrease reflects the number of posts published, with 263 posts published in 2009, 200 in 2010 and 137 to date in 2011. In terms of the average numbers of views per post in 2009 there were 342 views per post,  443 in 2010 and 508 to date in 2011.

Analysis of Blog Referrer Traffic

In addition to these usage statistics I’d also like to analyse the Web sites which drove traffic to this blog. As can be seen from the accompanying image showing details of the referrer traffic (captured over a week ago) the Twitter Web site was the most significant driver of traffic (having provided 8,0291 views up to today – 28 October 2011) , sending more than twice as much traffic than Google Reader, which was in second place with 3,792 views.

After the UKOLN Web site (2,672 views ) there were then some further Web-based RSS readers (Netvibes and Bloglines with 2,356 and 2,300 views) followed by the Google search page (2,058 views) and then an individual’s blog provided by Stephen Downes which delivered 1,292 views followed by Facebook with 1,112 views and another aggregated collection of visits generated by Google search which delivered 1,006 views.  These were the only services which have delivered over 1,000 views. In total the top nine referring Web sites delivered 24,617 views.

This is, however, a very small proportion of the 376,700+ total number of views (~6.5%).  How else have people arrived at the blog if not by the Twitter Web site, Google, RSS readers and other popular blogs and Web sites?  The answer could be that there is a long tail of referring Web sites.

Unfortunately WordPress does not provide a total for referrer statistics. However after copying the data into Excel I find that there are 499 referring Web sites which deliver a total of 45,804 visits, with the last seven entries each delivering five visits.  I am assuming that there WordPress either displays a maximum 0f  500 entries or has a cut-off of five visits.  But based on the statistics which are available it seems that Web sites referrers only deliver ~12.2% of the traffic.  In order to understand how the missing 78% of the traffic arrived at the Web site I’ve looked at traffic for a particular post.

Looking  at the statistics for the recent post on Are University Web Sites in Decline? it seems that there were 297 views of the blog on the day the post was published, but there were only 102 referrers from Web sites. Looking at the bit.ly statistics for the post it seems that there were 36 clicks on this link on the day of publication and 32 on the following day, with only 11% of those views coming from the Twitter Web site. However over 50% of the views are still unaccounted for. Some of these will probably be from email subscribers of the blog; there are 95 subscribers who use the Feedburner email service with 32 subscribers viewing the post on 20 October. And the remainder?  I suspect they’ll be other Twitter users who have followed a URL provided by a link shortening services besides bit.ly.

Using another example, as described above the busiest day for this was 14 January 2011 when there were 1,420 views following the publication of a post on Institutional Use of Twitter by Russell Group Universities. Looking at the bit.ly statistics for this post we can see that there were 798 of the shortened link on the day the short link was published on Twitter.  The WordPress statistics for the post show that there were 1,088 views of the post on the blog on the day of publication.

Conclusions

The conclusion I have reached: most people now view posts on this blog following alerts they have come across on Twitter rather than via a Google search or by subscribing to the blog’s RSS feed.  Or, to put it more succinctly, social search is beating Google and RSS.

Is this really the case or have I misinterpreted the data? And if the data is accurate for my blog is this trend being replicated across other blogs?

I’d be very interested to hear from other blog authors on how traffic is arriving at their blog. Tony Hirst has kindly provided a screenshot of referring traffic to his OUseful blog for the past year which shows that Twitter Web site is also, by a significant margin, the most popular referrer site. Any other bloggers have findings they are willing to share?

Posted in Blog, Evidence | 14 Comments »