Reproducibility in science: calculated kinetic isotope effects for the Baeyer-Villiger reaction.

Recollect this earlier post on the topic of the Baeyer-Villiger reaction. In 1999 natural abundance kinetic isotope effects were reported[cite]10.1021/ja992016z[/cite] and I set out to calculate the values predicted for a particular model constructed using Quantum mechanics. This comparison of measurement and calculation is nowadays a standard verification of both experiment and theory. When the two disagree either the computational model is wrong or incomplete, or the remoter possibility that there is something not understood about the experiment.

In this case, as you can see above, the measured ¹³C KIE at the carbonyl carbon was in the range 1.045-1.051, whereas the theory (ωB97XD/Def2-TZVP/SCRF=DCM) predicted a significantly smaller value of 1.023 for the first kinetic step, the formation of the tetrahedral intermediate. This was the step suggested by Singleton as rate limiting in the forward direction (there was also a larger disagreement on the ²H KIE at C3/C5, measured as 0.97 and calculated as 0.997 but this might simply be a typographical error).

Now we have to find an explanation, and it was contingent upon myself to show that the theory was properly executed. At this stage, Dan Singleton offered some suggestions on this blog. This one related to how I had calculated the KIE, with Dan suggesting that the method I had used might be too inaccurate to draw any conclusions from. It was up to me to reproduce my results using the method suggested by Dan as the more accurate. And this is where this story really starts.

I had used the Free energy method, which involves calculating the free energies of reactant and product using the built-in thermochemical component of the Gaussian code. Dan’s concern was that such free energies are only reported to an accuracy of 0.000001 Hartree. Since the isotope effect is calculated from differences in these free energies, was this degree of precision high enough to ensure reliable calculated KIE? Better to solve the equations directly and these involve the so-called Bigeleisen-Mayer terms derived from the classical partition functions. I recollected we had used this method back in 1975[cite]10.1021/ja00486a013[/cite], but I assumed the computer code used was long since lost by myself. Instead Quiver was suggested (first written around 1988 by Keith Laidig at Yale) and I added the comment that the modern version of Quiver is called THERMISTP[cite]10.1021/ja072375r[/cite]. All that needed to be done was to acquire a copy of an appropriate code and re-run the KIE calculation using it.

And now comes the true purpose of this post, which is reproducing a calculation that might have been done ~20 years ago using computer code. Issues such as:

Does the code still exist in either Source form or Executable form?
If it does, will it still run correctly on a modern computer?
Is it documented sufficiently to allow someone not immersed in the code to run it. After ~20 years, it might not be possible to talk to the original coders for explanations.

So this is what happened when I investigated.

Requests for the THERMISTP code of its authors, by email, have not yet brought a response. They may still, but one has to recognise that they may not. For example (unless I missed it), the article in which THERMISTP is cited[cite]10.1021/ja072375r[/cite] does not document how the code described in that article might be obtained (but the authors’ emails are provided).
I then followed up Dan’s suggestion to use Quiver. A copy was tracked down at http://www.chem.ucla.edu/~mccarren/houkGroup.html and a tar archive was available. The program comes as Quiver itself and a pre-program called qcrunch is used to prepare the data for Quiver.
It comes with pretty good (and brief!) documentation.
Fortran source code for Quiver (quiver.f) is available which is good, since one can compile it for one’s favourite machine (in my case a Mac running OS X). One can also read the comments to discover information beyond that provided by the documentation.
qcrunch however is only provided as an executable, and that means running it on Linux. One cannot read the comments!
When I ran qcrunch, it complained that an auxiliary program was not available to it. It wanted to run something associated with Gaussian 94, sadly long since gone from our systems. With no source code available I would have to fool qcrunch into thinking that Gaussian09 was really Gaussian94. It involves symbolic links and such and needs a little Unix expertise.
That done, I gave it its first input data, which is the normal mode analysis of the reactant involved in the reaction above, in the form of a formatted checkpoint file (a .fchk file). It seemed happy, and so I proceeded to the next step
which was to run quiver. This needed compilation. Here we encounter interesting issues. Most Fortran code can have “dependencies” which relate to the compiler used by its developer. The compiler however is rarely stated in the documentation, and I decided to use GFortran, which I obtained from http://hpc.sourceforge.net and installed on my Mac. And using this compiler, the compilation failed with about 3 syntax errors. You have to know enough Fortran to correct them all and obtain your executable.
Now I could run quiver to get the Bigeleisen-Mayer functions. It appeared to happily run, but produced rubbish (the anticipated frequencies had silly values of -10,000 cm^-1 etc).
Now you go back to the code, and discover that the dimension statement (yes, Fortran required you to specify the size of your arrays in the code) is limited to 40 atoms! The Baeyer-Villiger system has 48. Easily fixed. Still rubbish out!
So now you go back to the start and create data for a tiny molecule, water in this case, and again qcrunch and quiver are run. Now sensible results, in the shape of normal mode frequencies of the anticipated values.
Conclusion: That qcrunch is probably also dimensioned to 40 atoms, but I now have no way of correcting this because I do not (yet?) have the source code for it. It might indeed be lost. If it is, it will have to be re-written from scratch.
Time elapsed thus far: ~2 weeks of intermittent work on the problem.
It was at this point I had one of those happy moments of accidental discovery. I updated the OS on my Mac to 10.10.4 (this was released yesterday). One of the new features was TRIM support for solid state third party disks (SSDs). I installed it and decided to test an external SSD connected to my machine. It happened to have lots of old stuff on it. In an idle moment, I decided to search its file base for the program we had written back in 1975 and which I had convinced myself was long-lost. Hey presto, there it was (KINISOT.f). I even found example input and output files. These take the form of frequencies for reactant (normal isotopes), reactant (new isotope), transition state (normal isotope) and transition state (new isotope). Not as elegant as quiver, but entirely fit for my purpose here.
The results:
1. ¹³C KIE for the carbonyl carbon using the free energy method: 1.023 (see diagram above)
2. ¹³C KIE using the Bigeleisen-Mayer partition function ratios: 1.0226
3. ²H KIE for the axial α protons using the free energy method: 0.928
4. ²H KIE for the axial α protons using the Bigeleisen-Mayer partition function ratios: 0.92831
The agreement enables me to conclude that the free energy method does not in fact suffer from significant round-off errors and other inaccuracies induced by the subtraction of two very similar energies.

What I have I learnt from this experience?

That reproducing a calculation using old computer codes (in this case ~18 years) can be a difficult and complex procedure, relying on being able to contact people and ask for copies of the code, coping with possibly non-existent documentation, and with no access to the original coder who may be the only person able to explain it.
That 20 years of continuous improvement in the computer industry means that a problem that would require impossibly enormous amounts of computer time then might be easily feasible now. The limit of 40 atoms in 1997 for quiver and qcrunch must have seemed future proof then, but alas it did not prove so.
That if the source code for a program dimensioned too small is lost, one may have no option but to re-write it from scratch. It would indeed be a good test of reproducibility if the answers are unchanged!
Keep good archives of your old work. You never know when they might come in useful!

Lessons for the future?

Document your code! Put yourself in the position of someone in 20 years time trying to make sense of it! And record the compiler used (along with flags), and also the OS.
Archive it in Zenodo or Github, and include that documentation.[cite]10.5281/zenodo.19272[/cite] You might not be contactable by email in 20 years time!
include test inputs and outputs (OK, that is mostly done nowadays).
I intend to do the above for my program. But I will use the immediate excuse often used by others for not archiving their codes: it is not yet documented sufficiently! But surely this would only take an hour or so? Watch this space. And please forgive the coding, it was done in 1975, when I was very inexperienced (another oft-used excuse!).

What about the Baeyer-Villiger isotope effects? Well, the above merely establishes that the free energy method (which requires no extra codes) has sufficient accuracy for computing KIEs. An explanation for the difference between the reported ¹³C KIE at the carbonyl carbon and its calculation still needs identifying. I state at the outset that (heavy atom) tunnelling corrections are NOT yet applied, and I might try the small-curvature tunneling model suggested by Dan, since he and Borden[cite]10.1021/ja1055593[/cite] have shown it can indeed be important. It would be exciting if the hydrogen isotope effects can be reproduced without tunnelling but that the carbon KIEs do require it.

I have received much help in the above saga from Jordi Bures Amat here, Erik Plata in Donna Blackmond’s group and information about another useful system for calculating KIE, the ISOEFF program.[cite]10.1023/A:1019173509273[/cite]

Tags: Dan Singleton, Donna Blackmond, energy method, Erik Plata, free energy method, Jordi Bures Amat, Keith Laidig, Kinetic isotope effect, Physical organic chemistry, Yale

This entry was posted on Wednesday, July 1st, 2015 at 5:01 pm and is filed under reaction mechanism. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Reproducibility in science: calculated kinetic isotope effects for the Baeyer-Villiger reaction.”

Henry Rzepa says:

July 12, 2015 at 11:15 am

A software developer who is also an ardent member of the open source community popped in to see me on Friday and we talked about issues of code reproducibility. One camp veers towards complete transparency, achieved by depositing source code and full example inputs and outputs for the calculation in e.g. Github or Zenodo. But he surprised me by saying he often encountered considerable opposition from an opposing camp to the practice of providing complete inputs and outputs from a calculation.

This opposing argument goes (in part) along the lines that modern software has become far too easy for almost anyone to use, a use which can be uninhibited by lack of knowledge of the theory behind the program and particularly its correct application. As a consequence, there are now appearing in the literature many mis-applications of theory caused by the ease of use of “black-box software”. Such ease of use would be exacerbated by making available the input and output file sets, which can then be edited to a new problem without any understanding of the resulting effects.

The counterbalance to this argument is the absolute necessity of showing that any given program is behaving as intended. For example, some 980 test input and output examples come with the standard Gaussian program install to ensure that the code as running on the user’s machine is producing correct results. Reverse engineering the syntax of the inputs and outputs can be both rewarding and enlightening. As is learning and interpreting the often arcane error codes that (only) some programmers so lovingly insert to warn the user of errors or suspicious outputs.

Another argument I have encountered is the provision of supporting information where selective editing of output especially, but also input is followed by (heavy) reformatting into a PDF document. This is thought to provide sufficient (?) information to a determined re-user, but is also designed to ensure that they have sufficiently (high) motivation to not only want to go through the process but as a side-effect to learn the theory behind the code and how to apply it correctly.

In general I favour the full disclosure approach, although I can sympathise with both camps. If e.g. a referee has access to all the inputs and outputs from which the arguments in an article are constructed, they can quickly see if perhaps theory has been mis-applied. Referees rarely have the time or the inclination to reassemble data found in 100+ page PDF files, or even to submit the same inputs themselves to check if they too can reproduce the claimed outputs. The same might be true for the readers of articles that succeed in such peer review and enter into the journal. And it indeed in the interests of everyone, especially the original researcher, to learn quickly the boundaries of how the theory should be applied (or not).

In the case of this blog for example, I have learnt that one must test isotope effects for heavy atom tunnelling, even if it is only to exclude their contribution. At least the mechanism of a blog allows such feedback. The feedback is less effective I think when it comes to traditional journals, especially if the referee selected by the journal is unfamiliar with the computational detail.

Reply
Chris Chang says:

July 16, 2015 at 5:10 pm

Thanks Henry, for the post and the follow-up.

It sounds like the opposing argument to full disclosure is supporting a version of “security through obscurity,” which I wouldn’t support at all. I’m sure that theoretical approximations are misapplied, but the answer can’t be to hide the ability to do so–by that argument, we mustn’t let people drive because some might cause accidents, they should build their own roads with a proper knowledge of civil engineering and construction! I’d think the answer to misapplications of theory is (1) strong (and I would add, open) reviews of manuscripts, as you intimated, as well as (2) a cultural realization in the modeling community that there is too much to know, too much emphasis on publication over scholarship, and no shame in correcting errors to get to the truth.

Reply
Henry Rzepa says:

July 16, 2015 at 9:10 pm

An example of how a community (which in this case is around 3 people) can iterate to a better understanding of how to apply theory can be seen in the comment section of an adjacent blog post.

As for the security by obscurity argument, I have heard colleagues reminisce about the good old days when most of the relevant data for a particular piece of research would only be found in the printed and bound PhD dissertation, itself to be found in the depths of the university library, and quite often accessible only by a personal visit. Only the very very most determined researchers would avail themselves of the thesis (and the data it contained).

I am indeed hugely grateful to my own university for (in fact without asking my permission, which would of course have been granted) digitising my own PhD thesis and placed it online. No longer is it obscure. Now anyone can see what I got up to in my PhD!

Reply
Henry Rzepa says:

January 4, 2016 at 8:14 am

I received this useful information, and post it here:

I saw your blog post about the difficulties associated with
calculating KIEs. I'm working on a project that uses that kind of
calculation, and I've placed QUIVER and associated scripts to use it on github. I tried to make it as user friendly as possible and there is documentation. https://github.com/ekwan/quiver I hope you find it useful.

Eugene Kwan.

Reply

Henry Rzepa's Blog