Citation Analysis Services
I recently wrote a post entitled “Will the Real Scott Wilson Please Stand Up, Please Stand Up” in which I described my initial experiences with the Microsoft Academic Search service. I have to admit that I was impressed by the user interface and how, for example, it depicted links with my co-authors.
Revisiting Microsoft Academic Search
The main limitation with the Microsoft Academic Search service was, I felt, the accuracy of the data and the need to get author buy-in in order that authors could claim their own papers and remove papers incorrectly attributed to them. The information it has about me, for example, suggests that I have published 56 papers, including one dating back to 1979. In fact it should know about 30 of my papers, the earliest of which was published in 1994.
Several weeks ago I edited my publications list to remove papers written by other Brian Kellys. These edits have been accepted and when I sign in I get confirmation of the 38 papers I have confirmed authorship of and the 18 which have been removed from the list. However the wiki-style approach to editing the content means that edits have to be confirmed and this does not appear to have happened. I therefore appear to be claiming more publications that is the case and, possibly, the citation statistics (G-Index=11 and H-Index=6) for my papers may be inaccurately calculated.
Google Scholar Citations
Whenever I come across a new service which appears to provide value I am also interested in seeing if there are alternative offerings. In part this is to ensure that I don’t find myself being locked into a single vendor. But in addition it can also help to see how other providers address the same area. As the Microsoft Academic Search service is based on harvesting metadata about papers hosted on institutional repositories, publishers Web sites and similar resources we should expect to see similar competing services. I was therefore pleased when I received an email last week which announced that the Google Scholar Citations service, which I had signed up to during the beta testing, had been opened as a public service.
A post was published on the Google Scholar blog on Wednesday 16 November 2011 entitled “Google Scholar Citations Open To All‘ which described how:
You can quickly identify which articles are yours, by selecting one or more groups of articles that are computed statistically. Then, we collect citations to your articles, graph them over time, and compute your citation metrics – the widely used h-index; the i-10 index, which is simply the number of articles with at least ten citations; and, of course, the total number of citations to your articles. Each metric is computed over all citations and also over citations in articles published in the last five years.
My Google Scholar Citations page is illustrated below. In comparison with my Microsoft Academic Search page this page appears somewhat limited in its functionality. It also has much less social connectivity, with links to only six of my co-authors who have registered for the service.
In addition to differences in the user interface and the social connections, Google Scholar Citations also has differences in the papers it has analysed and the corresponding citation indices, giving a H-index of 11 (in comparison with Microsoft Academic Search’s H-index of 6). Google Citations also provides a I10-Index score of 12 whereas Microsoft Academic Search provides G-Index score of 11.
Google Scholar Citations’ analysis of the papers indexed by Google Scholar seems to be based on a more accurate representation of my papers, possibly because I verified my papers some time ago. Google Scholar also includes a number of popular articles I wrote which haven’t been deposited in the University of Bath repository and therefore don’t seem to have been indexed by Microsoft Academic Search, such as the Ariadne article on “An accessibility analysis of UK university entry points” for which there have been 28 citations. But in addition a paper on “Using networked technologies to support conferences” delivered at the EUNIS 2005 conference which has been deposited in the in the University of Bath repository has been indexed by Google Scholar but not by Microsoft Academic Search.
Whilst investigating Google Citations I came across a tweet from Les Carr who provided a link to his Google Citations page, which is illustrated below (which brought to my attention the paper on “Earlier web usage statistics as predictors of later citation impact” from 2006 which will be worth reading in light of Social Web developments since the paper was published in 2006).
In order to make some further comparisons between the coverage and citation analyses of Google Citations and Microsoft Academic Search I’ve summarised details for Les Carr together with the co-authors of my papers who have registered with Google Scholar Citations in the following table.
|G-Index (MAS)||I10-Index (GC)||H-Index (MAS)||H-Index (GC)|
It should be noted that:
- The Microsoft Academic Search entry for Jane Seale has her affiliation listed as the University of Southampton. She is now based at the University of Plymouth so her citation statistics may be split across two entries.
- There are two Microsoft Academic Search entries for Lorcan Dempsey: entry 1 and entry 2.
- here are two Microsoft Academic Search entries for Alastair Dunning: entry 1 and entry 2.
I’m pleased that Google have provided an alternative to Microsoft for providing details of citations for research publications (there are similar services, of course, but I thought it would be worth focusing this post on a newly released service and provide comparisons with a service I described recently).
Microsoft Academic Search seems to have taken an approach of indexing as many research papers as it can find, associating the papers with author and institutions. The Microsoft Academic Search entry point currently states that it provides access to “6,684,802 publications and 18,831,151 authors, 5,472 updated last week“. Papers are automatically assigned to organisations, with the details for the University of Bath providing the following information: Publications: 29,331; Citation Count: 131,732; H-Index: 96 and 1,638 authors. In addition papers may also be assigned to departments with the details for Bath/UKOLN providing the following information: Publications: 262; Citation Count: 932; H-Index: 15 and 245 authors.
The problem with such automated processing is that the data can be flawed with. In contract the Google Scholar Citations requires users to opt-in before their papers are assigned to their Google account. This means, for example, that Google Scholar Citations currently has details for only 18 authors from the University of Bath.
It seems to me that rather than the functionality of the services I’ve described, the main challenges will be getting buy-in from the authors’ whose papers have been indexed. They will be both a significant user community for such services as well as possibly having responsibility for cleaning up the data.
Some questions which came to mind when I was looking at these services:
- What is being indexed? The Microsoft Academic Search service seems to have indexed primarily my peer-reviewed papers which I have deposited in the University institutional repository and from publishers’ databases. The Google Scholar Citation service, in contrast, seems to have also included papers from the UKOLN Web site which I wouldn’t have classed as ‘papers’. I have removed papers which don’t fit in with my view of what should be included, but I appreciated that such definitions are likely to be very subjective.
- Motivation to manage one’s content. What is the motivation to manage one’s content? Since the automated harvesting and assignment of papers is liable to lead to errors, there will be a need for the data to be cleansed. But what are the motivating factors for authors to do this?
- Barriers to the management of one’s content. Although authors may have motivating factors, such as ensuring that popular services provide an accurate view of their research publications, there may also be barriers to updating one’s data. This might include the user interfaces provided by the services, the turnaround time for changes to be approved and the requirements for a Windows Live ID (in the case of Microsoft Academic Search) or a Google ID (in the case of Google Scholar Citations).
I recently came across a tweet from Guus van Brekkel (@digcmd) who described:
How Google Scholar Citations passes the competition left and right at WoW! Wouter on the Web bit.ly/uw8ppc
The tweet introduced me to the WoW!ter blog, written by Wouter Gerritsma, subject librarian and bibliometrician at Wageningen UR Library. In the post Wouter gave his thoughts on the service:
Google Scholar Citations really excels at finding publications you completely forgot about.
and went on to make comparisons with other alternatives:
Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.
although he admitted that:
there are disadvantages to Google Scholar as well. The most serious at this moment all kind of ghost citations.
Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.
Perhaps the risk of failing to engage with the service and update the information which Google has will turn out to be the motivating factor for updating the content. I’ve updated my content and started to email my co-authors so that they are listed. Have you updated your papers? And if not, I’d be interested to know the reasons why not.
Twitter conversation from Topsy: [View]