Research data and the "h-index". - Henry Rzepa's Blog Henry Rzepa's Blog

Research data and the “h-index”.

The blog post by Rich Apodaca entitled “The Horrifying Future of Scientific Communication” is very thought provoking and well worth reading. He takes us through disruptive innovation, and how it might impact upon how scientists communicate their knowledge. One solution floated for us to ponder is that “supporting Information, combined with data mining tools, could eliminate most of the need for manuscripts in the first place“. I am going to juxtapose that suggestion on something else I recently discovered.

Someone encouraged me to take a look at Google Scholar. It is one of those resources that, amongst other features, computes an individual’s h-index and i10-index (the former, having gone through its purple patch, is now apparently at the end of the road, at least for chemists). One reason perhaps why proper curation of research data is not high on most chemists’ list of priorities is that it does not contribute to one’s h-index, and particularly one’s prospects of a successful research career. Thus “supporting information (data)” is one of those things, like styling the citations in a research article, that most people probably prepare through gritted teeth (a rather annoying ritual without which a research article cannot be published). So when I inspected my own Google Scholar profile (you can do the same here) I was rather surprised to find, appended to all the regular research articles, a long list of data citations (sic!). Because I have placed much of my own data into a digital repository,^‡ this has opened it up to Google (where don’t they get to nowadays?) for listing (if not actually mining). These citations of themselves actually do not (currently?) contribute to eg the h-index, since currently these entries are not attracting citations by others. And that of course is because doing so is not yet an accepted part of the ritual of preparing a scientific article.

Most scientists must now be pondering what the future holds in terms of how they can bring themselves to the attention of others (in a good way) and hence progress their careers. So I will take Rich’s suggestion one step further. Those scientists who create new data in a process called research, should firstly curate this data properly (via eg a digital repository) and then expect to promote their activity by garnering not only citations for the published narratives (= articles) but also associated published data. Their success as a researcher would be (in part) judged by both. Who knows, as well as famous published narratives, perhaps we will also rank famous published datasets!

^‡I do the same for the data I use to support many of the posts for this blog.

Author

Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

View all posts

Tags: data mining, data mining tools, Google, opendata, researcher

This entry was posted on Monday, June 24th, 2013 at 1:41 pm and is filed under Chemical IT. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Research data and the “h-index”.”

150,000,000 DFT calculations on 2,300,000 compounds! « Henry Rzepa says:

July 7, 2013 at 8:18 am

[…] which emerged out of the WWMM experiments) as well as Figshare[4]. The first and the third assign unique handles (i.e. a doi) to the data; chempound does not (and neither does […]

Reply
A two-publisher model for the scientific article: narrative+shared data. « Henry Rzepa says:

September 15, 2013 at 6:05 pm

[…] have noted previously how e.g. Google Scholar identifies data citations along with article citations in constructing an […]

Reply

Henry Rzepa's Blog