Archive for May 25th, 2023

“For chemists, the AI revolution has yet to happen”.

Thursday, May 25th, 2023

This editorial from Nature[1] is a timely reminder of the importance of data. But also, not just any data, but “accurate and accessible training data“. Accessible of course is one of the attributes of FAIR (Findable, Accessible, Interoperable and Re-usable). The editorial also states “data need to be recorded in agreed and consistent formats, which they are not at present“. That is covered by the I and R of FAIR, often applied in conjunction with metadata recording the Media type that the data is held in (See DOI for examples of the use of Media types in chemical computation and chemical NMR). Again, “The best possible training sets would also include data on negative outcomes“. This relates to the separation of the two publication processes, namely the article itself (or the story behind the data) and the data itself as a first class scientific object. Thus when we publish FAIR data in association with articles, the data archive will often contain data that is not used in the article itself (perhaps because it led to a negative outcome), but is nevertheless part of the FAIR data collection for that topic. Even if the data does not lead to journal publication, publishing it in a data repository means it will not be lost. Somebody (or AI software) may still find it useful.



  1. "For chemists, the AI revolution has yet to happen", Nature, vol. 617, pp. 438-438, 2023.