“For chemists, the AI revolution has yet to happen”.

Thursday, May 25th, 2023

This editorial from Nature[1] is a timely reminder of the importance of data. But also, not just any data, but “accurate and accessible training data“. Accessible of course is one of the attributes of FAIR (Findable, Accessible, Interoperable and Re-usable). The editorial also states “data need to be recorded in agreed and consistent formats, which they are not at present“. That is covered by the I and R of FAIR, often applied in conjunction with metadata recording the Media type that the data is held in (See DOI for examples of the use of Media types in chemical computation and chemical NMR). Again, “The best possible training sets would also include data on negative outcomes“. This relates to the separation of the two publication processes, namely the article itself (or the story behind the data) and the data itself as a first class scientific object. Thus when we publish FAIR data in association with articles, the data archive will often contain data that is not used in the article itself (perhaps because it led to a negative outcome), but is nevertheless part of the FAIR data collection for that topic. Even if the data does not lead to journal publication, publishing it in a data repository means it will not be lost. Somebody (or AI software) may still find it useful.



  1. "For chemists, the AI revolution has yet to happen", Nature, vol. 617, pp. 438-438, 2023.

Tunable aromaticity? An unrecognized new aromatic molecule?

Sunday, May 21st, 2023

Some time ago in 2010, I showed a chemical problem I used to set during university entrance interviews. It was all about pattern recognition and how one can develop a hypothesis based on this. In that instance, it involved recognising that a cyclic molecule which appeared to have the cyclohexatriene benzene-aromatic pattern 1 was in fact a trimer of carbon dioxide. Perhaps small amounts of this aromatic molecule exist in solutions of fizzy drinks? Analysing these patterns occupied about 10-20 minutes of an interview, and although you might think I was posing a difficult challenge, many students successfully rose to it! Now I revisit, but with a slightly better reality check on a related molecule 2 (cyanuric acid).