FAIR data ⇌ Raw data.

FAIR data is increasingly accepted as a description of what research data should aspire to; Findable, Accessible, Inter-operable and Re-usable, with Context added by rich metadata (and also that it should be Open). But there are two sides to data, one of which is the raw data emerging from say an instrument or software simulations and the other in which some kind of model is applied to produce semi- or even fully processed/interpreted data. Here I illustrate a new example of how both kinds of data can be made to co-exist.

I will start with a recent publication[cite]10.1002/anie.201407165[/cite] with the title Crystallographic Snapshot of an Arrested Intermediate in the Biomimetic Activation of CO2The nature of this intermediate caught the eye of another research group, who responded with their own critique[cite]10.1002/anie.201411654[/cite] along with the comment “However, since we have no access to the original crystallographic data …” They might have been referring to the semi-processed data (containing the so-called hkl structure factors) but they may also have been alluding to the raw image data captured directly from the diffractometer cameras. That traditionally has not been available via the CSD (Cambridge structural database), but would be required for a complete re-analysis of the crystal structure. Now the first example of how both FAIR (processed) data and raw data can co-exist has appeared.

The latest version of the CSD database shows an entry resulting from the following publication[cite]10.1021/acsomega.7b00482[/cite] and the deposited data has its own DOI there (10.5517/ccdc.csd.cc1n9ppb). That entry in turn has a DOI pointer to the Raw data (10.14469/hpc/2300) held in a different location and the pointer is reciprocated (⇌) with the latter pointing back to the former. Both datasets point to the original article, thus completing a holy triangle.

There is more. The Raw dataset (10.14469/hpc/2300) declares it is a member of a superset, called Crystal structure data for Synthesis and Reactions of Benzannulated Spiroaminals; Tetrahydrospirobiquinolines (10.14469/hpc/2297where you can find information about six other related structures. That collection is in turn a member of a superset called Synthesis and Reactions of Benzannulated Spiroaminals; Tetrahydrospirobiquinolines (10.14469/hpc/2099where DOIs to other types of data associated with this project can be found, such as Computational data (10.14469/hpc/2098) and NMR data (10.14469/hpc/2294). Although a human can with some determination follow these associations up, down and across, the system is designed to also be followed by automated algorithms that could traverse this web quickly and efficiently.

So you can now see that a crystal structure held in the CSD could be the starting point for a journey of FAIR data discovery, in manner that has not hitherto been possible. How quickly the CSD will become populated by links to Raw (and other) data remains to be seen. I have not yet discovered any mechanism for specifying a CSD query which stipulates that Raw data must be available, but no doubt this will come.

To end, back to the Biomimetic Activation of CO2 referred to at the start. With no access to the original data, recourse was made to computational modelling.[cite]10.1002/anie.201411654[/cite] Which where  I came in, since I wanted access to the original (computational) data. Sadly it did not appear to be available with the article,[cite]10.1002/anie.201411654[/cite] in much the same manner as the original complaint. Perhaps, when FAIR data becomes fully accepted as part of how science is done nowadays, such complaints will become ever rarer!

In fact the original authors did respond[cite]10.1002/anie.201504197[/cite] with an acknowledgement that their original conclusions were not correct.

Almost. The article [cite]10.1021/acsomega.7b00482[/cite] cites DOI: 10.14469/hpc/2099 (Ref 28), but it does not cite DOI: 10.5517/ccdc.csd.cc1n9ppb because the latter had not been minted yet at the time the final proofs were corrected, and there is no mechanism to add it at a later stage.

Tags: , , , , , , , ,

Leave a Reply