Previously, I had noted that Corey reported in 1963/65 the total synthesis of the sesquiterpene dihydrocostunolide. Compound 16, known as Eudesma-1,3-dien-6,13-olide was represented as shown below in black; the hydrogen shown in red was implicit in Corey’s representation, as was its stereochemistry. As of this instant, this compound is just one of 64,688,893 molecules recorded by Chemical Abstracts. How can we, in 2011, validate this particular entry, and resolve the stereochemical ambiguity? Here I discuss one approach (a vision if you like of the semantic web).
The following facts are asserted about 16;
- Its connection table, namely what atoms are connected by at least a single bond.
- The (presumed) absolute stereochemistry at four stereogenic centres, leaving the 5th (in red) either unknown or implicit. I say presumed because often when it is not known which of two possible enantiomers a scalemic molecule exists in; just one is often drawn, in essence as a guess.
- The 1H NMR chemical shifts of 13 of the 20 hydrogen atoms present in the molecule (the solvent used is unreported, and may be implicitly chloroform).
- [α]D +375° (no solvent reported)
- m.p. 69.5-70.5° (note by the way that the units represented by the symbol ° are quite different for these two facts! A scientist of course can easily recognise the implicit difference)
- λmax (methanol) 265 mµ, ε 4800 (note again the ambiguity in the units, in fact 265 mµ is nowadays written 265 nm and the molar extinction coefficient ε is assumed to be expressed in units of L mol−1 cm−1).
- From a given connection table, an accurate prediction of the 3D coordinates of all the atoms for, in this case, either of the stereoisomers involving the hydrogen shown in red.
- The 1H NMR shifts relative to TMS, to an accuracy of better than 0.5ppm (often very much better).
- λmax (methanol) and an approximate estimate of ε.
- The 1H NMR was calculated at a ωB97XD/6-311G(d,p) optimised geometry and a single point 6-311++G(d,p) wavefunction. I have linked the “DOI” identified for this calculation to this post so that the calculation itself can be verified by others. It comes out (in ppm) δ 1.02 [0.98, 3H,s], 1.17 [1.15, 3H, d], 2.11 [1.95, 3H, s], 3.85 [3.79, 1H, dd], 5.75, 6.13, 6.30 [5.2-6.0, vinyl], the reported experimental values being in square brackets […].
- The spin spin couplings were calculated using the NMR(spinspin,mixed) model implemented in Gaussian (a specification for which is found in the online documentation of the NMR keyword). For δ 3.79, two couplings of 10 Hz are reported. The calculation predicts 9.77 and 9.53 Hz (for assignments, click on the image above to get a 3D model).
- [α]D +391° (calculated for chloroform)
- λmax 265 nm (calculated for methanol; ε ~4800 for a linewidth of 3600 cm-1).
- Strictly speaking, all of the above should be repeated for the other possible stereoisomer, and the results for the two together analysed statistically.
- provide estimated chemical shifts and coupling constants for ALL the protons in the molecule, not just the 13 reported by Corey, and for all the carbons (no 13C spectrum was reported). Advances in spectrometer sensitivity and resolution mean that if these spectra were ever to be (re)measured, the additional protons could probably be easily identified, and both homo and heteronuclear spin-spin couplings measured.
- predict the electronic circular dichroism spectrum for 16 (not previously measured) and in particular the Cotton effect on the λmax 265 nm absorption as being positive (Δε ~+20). This would allow the absolute configuration of this scalemic molecule to be independently validated. We could add to this a prediction of the vibrational circular dichroism spectrum if need be.
- What we cannot easily do is predict the melting point (or indeed the crystal packing), although no doubt this will become more reliable in the future.
But think how many (millions) of such molecules have been discovered, and how the majority of these have probably not been subjected to such rigorous scrutiny. It is entirely possible that much of the chemical literature is sprinkled with errors in assignments (and many more have unresolved ambiguities, such as the stereochemistry of the hydrogen shown in red at the top of this post). However, for the first time in the history of chemistry, we can now (almost routinely) use quantum modelling to provide independent validation of the chemical literature, as illustrated above. Of course, the validation is not absolute, merely probable to some degree (the above example we might agree shows a very high level of probability that the structure shown is in fact correct). More importantly, in computational validation, we have the potential for automation. One might strive for an infra-structure where much of the validation can be performed automatically, by tireless machines that operate 24/7, and that only flag probable errors when they discover them. This is the vision of the chemical semantic web!