Research data and the “h-index”.

The blog post by Rich Apodaca entitled “The Horrifying Future of Scientific Communication” is very thought provoking and well worth reading. He takes us through disruptive innovation, and how it might impact upon how scientists communicate their knowledge. One solution floated for us to ponder is that “supporting Information, combined with data mining tools, could eliminate most of the need for manuscripts in the first place“. I am going to juxtapose that suggestion on something else I recently discovered. 

Someone encouraged me to take a look at Google Scholar. It is one of those resources that, amongst other features, computes an individual’s h-index and i10-index (the former, having gone through its purple patch, is now apparently at the end of the road, at least for chemists). One reason perhaps why proper curation of research data is not high on most chemists’ list of priorities is that it does not contribute to one’s h-index, and particularly one’s prospects of a successful research career. Thus “supporting information (data)” is one of those things, like styling the citations in a research article, that most people probably prepare through gritted teeth (a rather annoying ritual without which a research article cannot be published). So when I inspected my own Google Scholar profile (you can do the same here) I was rather surprised to find, appended to all the regular research articles, a long list of data citations (sic!). Because I have placed much of my own data into a digital repository, this has opened it up to Google (where don’t they get to nowadays?) for listing (if not actually mining). These citations of themselves actually do not (currently?) contribute to eg the h-index, since currently these entries are not attracting citations by others. And that of course is because doing so is not yet an accepted part of the ritual of preparing a scientific article.

Most scientists must now be pondering what the future holds in terms of how they can bring themselves to the attention of others (in a good way) and hence progress their careers. So I will take Rich’s suggestion one step further. Those scientists who create new data in a process called research, should firstly curate this data properly (via eg a digital repository) and then expect to promote their activity by garnering not only citations for the published narratives (= articles) but also associated published data. Their success as a researcher would be (in part) judged by both. Who knows, as well as famous published narratives, perhaps we will also rank famous published datasets! 


I do the same for the data I use to support many of the posts for this blog.

Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

View Comments

Recent Posts

Internet Archeology: reviving a 2001 article published in the Internet Journal of Chemistry.

In the mid to late 1990s as the Web developed, it was becoming more obvious…

1 month ago

Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.

I have written a few times about the so-called "anomeric effect", which relates to stereoelectronic…

1 month ago

Data Citation – a snapshot of the chemical landscape.

The recent release of the DataCite Data Citation corpus, which has the stated aim of…

2 months ago

Mechanistic templates computed for the Grubbs alkene-metathesis reaction.

Following on from my template exploration of the Wilkinson hydrogenation catalyst, I now repeat this…

2 months ago

3D Molecular model visualisation: 3 Million atoms +

In the late 1980s, as I recollected here the equipment needed for real time molecular…

3 months ago

The Macintosh computer at 40.

On 24th January 1984, the Macintosh computer was released, as all the media are informing…

3 months ago