Metametadata: data about data about (chemical) data.

Scientists are familiar with the term data, at least in a scientific or chemical context, but appreciating metadata (meaning "after", or "beyond") is slightly more subtle, in the sense of using it to mean data about data. The challenge lies in clarifying where the boundary between data and its metadata lies and in specifying and controlling the vocabulary used for these metadata descriptions. Items in a chemical metadata dictionary might include e.g. subject classifications such as Organic Molecular Chemistry or identifiers such as InChIkey. But what could metametadata be? Here I briefly show some examples by way of illustration.

Let me start by defining a data repository as a store of both data and the metadata describing it. The metadata is to be exposed in a standard manner which allows it to be aggregated by other agencies. Nowdays, it is becoming common to identify such a data object together with its metadata using a persistent identifier, or DOI. But to decide if any particular repository and the data objects contained therein is generally useful to you, you need information about the metadata itself. Technically, this is defined using a schema[1] describing the metadata (which might e.g. identify any dictionaries used); hence metametadata. Now you need to store the metametadata and so I introduce the concept of a registry which does this. This metametadata object is itself assigned a DOI and here I list these DOIs for a personal selection of some chemically oriented examples, in this case deriving from the largest registry of research data repositories re3data.org. You can search for your own entry at their site: http://service.re3data.org/search.

Data repository The repository metametadata DOI Badge
Figshare 10.17616/R3PK5R[2]
Zenodo 10.17616/R3QP53[3]
Cambridge structure database 10.17616/R36011[4]
Crystallographic open database 10.17616/R37S31[5]

Oxford University Research Archive

10.17616/R3Q056[6]

Open Notebook Science

10.17616/R3859D[7]

Usefulchem

10.17616/R3Z89N[8]

Chemotion

10.17616/R34P5T[9]

Chemspider

10.17616/R38P4P[10]

Chemical Database Service

10.17616/R36P42[11]

Imperial College HPC data repository.

10.17616/R3K64N[12],[13]

Imperial College SPECTRa repository.[14]

10.17616/R30316[15]

Not all of the repositories listed in the table above assign formal DOIs to their data collections, meaning that the metadata for their entries cannot be aggregated in a searchable manner using e.g. search.datacite.org/ui (or search.datacite.org/api for the machine version). Currently, the metametadata does not fully carry this information, an aspect which I gather will be rectified in a future revision of the re3data schema.[1]

Importantly, both metadata and (repository) metametadata can be searched using APIs (application programmer interface), ensuring that the entire flow of meta information can be subject to automated software analysis rather than just visual inspections by a human.This should allow a rich and open infrastructure for handling research objects or data to be built up using hierarchical metadata. The examples above indeed show that the chemical space is already the largest component of the Natural Sciences space.

Although the edifice is still largely in its infancy, already I think we can start to see an alternative open approach emerging to "Googling" for data, or the even older traditional bespoke (i.e. non-open) services offered by commercial human-based abstractors of chemical metadata.


This DOI is information about the metametadata, and hence it is metametametadata, or m3data. Sorry! The citations at the foot of this post are generated entirely automatically (by a WordPress plugin called Kcite) from the m3data associated with each entry, i.e. the DOI listed. Were the persistent identifier for the entry ever to be changed, this would propagate automatically to the citation, unlike the static entries in the table.

 

References

  1. Rücknagel, Jessika., Vierkant, Paul., Ulrich, Robert., Kloska, Gabriele., Schnepf, Edeltraud., Fichtmüller, David., Reuter, Evelyn., Semrau, Angelika., Kindling, Maxi., Pampel, H.., Witt, Michael., Fritze, Florian., van de Sandt, Stephanie., Klump, Jens., Goebelbecker, Hans-Jürgen., Skarupianski, Michael., Bertelmann, Roland., Schirmbacher, Peter., Scholze, Frank., Kramer, Claudia., Fuchs, Claudio., Spier, Shaked., and Kirchhoff, Agnes., "Metadata Schema for the Description of Research Data Repositories", 2015. http://dx.doi.org/10.2312/re3.008
  2. re3data.org., "figshare", 2012. http://dx.doi.org/10.17616/R3PK5R
  3. re3data.org., "Zenodo", 2013. http://dx.doi.org/10.17616/R3QP53
  4. re3data.org., "The Cambridge Structural Database", 2013. http://dx.doi.org/10.17616/R36011
  5. re3data.org., "Crystallography Open Database", 2013. http://dx.doi.org/10.17616/R37S31
  6. re3data.org., "Oxford University Research Archive", 2015. http://dx.doi.org/10.17616/R3Q056
  7. re3data.org., "ONSchallenge", 2013. http://dx.doi.org/10.17616/R3859D
  8. re3data.org., "UsefulChem", 2014. http://dx.doi.org/10.17616/R3Z89N
  9. re3data.org., "chemotion", 2013. http://dx.doi.org/10.17616/R34P5T
  10. re3data.org., "ChemSpider", 2013. http://dx.doi.org/10.17616/R38P4P
  11. re3data.org., "Chemical Database Service", 2012. http://dx.doi.org/10.17616/R36P42
  12. re3data.org., "Imperial College High Performance Computing Service Data Repository", 2016. http://dx.doi.org/10.17616/R3K64N
  13. Henry Rzepa., " Imperial College High Performance Computing Service Data Repository Metadata Schema", 2016. http://dx.doi.org/10.14469/hpc/382
  14. J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, "SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories", Journal of Chemical Information and Modeling, vol. 48, pp. 1571-1581, 2008. http://dx.doi.org/10.1021/ci7004737
  15. re3data.org., "SPECTRa Project", 2013. http://dx.doi.org/10.17616/R30316
Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

Recent Posts

Internet Archeology: reviving a 2001 article published in the Internet Journal of Chemistry.

In the mid to late 1990s as the Web developed, it was becoming more obvious…

1 month ago

Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.

I have written a few times about the so-called "anomeric effect", which relates to stereoelectronic…

1 month ago

Data Citation – a snapshot of the chemical landscape.

The recent release of the DataCite Data Citation corpus, which has the stated aim of…

2 months ago

Mechanistic templates computed for the Grubbs alkene-metathesis reaction.

Following on from my template exploration of the Wilkinson hydrogenation catalyst, I now repeat this…

2 months ago

3D Molecular model visualisation: 3 Million atoms +

In the late 1980s, as I recollected here the equipment needed for real time molecular…

3 months ago

The Macintosh computer at 40.

On 24th January 1984, the Macintosh computer was released, as all the media are informing…

3 months ago