Data Citation – a snapshot of the chemical landscape.

The recent release of the DataCite Data Citation corpus, which has the stated aim of providing “a trusted central aggregate of all data citations to further our understanding of data usage and advance meaningful data metrics” made me want to investigate what the current state of citing data in the area of chemistry might be. Chemistry is known to be a “data rich” science (as most of the physical sciences are) and  here on this very blog I try to cite whenever possible the source(s) of the data that  I often use when discussing a topic. Such citations are not necessarily the same as citing a journal source via e.g. its DOI, although of course one is very likely to find data associated with most articles nowadays, albeit almost entirely via any associated supporting information document. However the latter is often presented in a relatively unstructured (PDF) form, which does not adhere to what are called the “FAIR” guidelines of being findable, accessible, interoperable and reusable. Directly citing data is a way of improving its FAIR-characteristics. So what insights does the Data citation corpus reveal?

  1. This overview shows that by far the most common mechanism for citing data is via its Accession Number, used predominantly by Life Sciences (an example of this latter is linked here[1]), with the DOI (digital object identifier) being less common.
  2. Tunnelling down to citation counts in chemical sciences by publisher, an odd picture emerges with just a handful of citations.
  3. The more general physical sciences does not fare much better:
  4. Lets try a different approach, filtering by repository. Thus here are the statistics for the Cambridge crystallographic data centre, which was citing data in large amounts a few years back, but which appears to have dropped off in the last few years. Given that the entries there continue to go up almost exponentially, we begin to suspect that the data citations there are not being properly recognised as such by the citation corpus.
  5. Lets try another repository, Zenodo, which again is dropping but where the totals are about 500 a year for the most recent.
  6. OK, one more go, the RSC chemistry publisher.

I am not sure what to make of this; areas where you would expect very high levels of data citation in chemical sciences do not appear to exist – I think for some reason, the DataCite citation corpus is not yet capturing them.[2] But when things do start operating as perhaps expected, I think we will have a very valuable resource, which should firmly put data (whether FAIR or not) on the map.

References

  1. D. Batista, A. Gonzalez-Beltran, S. Sansone, and P. Rocca-Serra, "Machine actionable metadata models", Scientific Data, vol. 9, 2022. http://dx.doi.org/10.1038/s41597-022-01707-6
  2. R. Page, "Problems with the DataCite Data Citation Corpus", 2024. http://dx.doi.org/10.59350/t80g1-xys37
Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

Recent Posts

Detecting anomeric effects in tetrahedral boron bearing four oxygen substituents.

In an earlier post, I discussed a phenomenon known as the "anomeric effect" exhibited by…

2 days ago

Internet Archeology: reviving a 2001 article published in the Internet Journal of Chemistry.

In the mid to late 1990s as the Web developed, it was becoming more obvious…

1 month ago

Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.

I have written a few times about the so-called "anomeric effect", which relates to stereoelectronic…

2 months ago

Mechanistic templates computed for the Grubbs alkene-metathesis reaction.

Following on from my template exploration of the Wilkinson hydrogenation catalyst, I now repeat this…

2 months ago

3D Molecular model visualisation: 3 Million atoms +

In the late 1980s, as I recollected here the equipment needed for real time molecular…

3 months ago

The Macintosh computer at 40.

On 24th January 1984, the Macintosh computer was released, as all the media are informing…

3 months ago