Data nightmares: B40 and counting its π-electrons

Whilst clusters of carbon atoms are well-known, my eye was caught by a recent article describing the detection of a cluster of boron atoms, B40 to be specific.[cite]10.1038/nchem.1999[/cite] My interest was in how the σ and π-electrons were partitioned. In a C40, one can reliably predict that each carbon would contribute precisely one π-electron. But boron, being more electropositive, does not always play like that. Having one electron less per atom, one might imagine that a fullerene-like boron cluster would have no π-electrons. But the element has a propensity[cite]10.1039/B911817A[/cite] to promote its σ-electrons into the π-manifold, leaving a σ-hole. So how many π-electrons does B40 have? These sorts of clusters are difficult to build using regular structure editors, and so coordinates are essential. The starting point for a set of coordinates with which to compute a wavefunction was the supporting information. Here is the relevant page: B401 The coordinates are certainly there (that is not always the case), but you have to know a few tricks to make them usable.

  1. Open Adobe Reader, select the coordinates and copy
  2. Paste into any application which recognises text. I used an old stalwart on the Mac, BBedit. It is reliable!
  3. But no, it produces a row of skull&crossbones characters (the authors of the program clearly have a sense of humour) B402
  4. Thinking that BBedit might have let me down (for the first time), I tried Word. A little less humour, but the same result. B403
  5. There are lots of web sites out there that claim to convert PDF files directly to Word files. Again, no luck, the coordinates are now entirely missing! B404
  6. Right, time for the big guns. Adobe Acrobat XI converts .PDF to .DOC, and (if you jump through a lot of hoops to register etc) they even give you a 30 day trial. Well, at least it gives numbers. But notice that the line breaks are missing, and all the numbers flow from one line to another.B405
  7. Another copy/paste from Word to BBedit, and now I have all the numbers, and adding 40 line breaks is all that is needed (there is sometimes some skill in knowing where to add them by the way). The time taken from step 1 to step 7 was about 90 minutes (including a necessary cup of tea to recover from steps 1-5, and the realisation that the time was not wasted, since I could blog the experience!).

Well, I am sure you know what is coming next; my usual rant about how little most chemists truly value data and particularly its integrity and its semantics. And how little almost all journals understand data. Notice that the original article was published in Nature Chemistry. Note also a new journal from that stable, Scientific Data. The journal clearly thinks there is mileage in receiving scholarly articles about scientific data, and what they call data descriptors (they even got me to write a data descriptor a year or so back). Its a shame then that the same publisher allowed the decimation of the core data related to an article about B40.

They have a widely read blog, perhaps they can comment?

One more point to make about data: a phrase has recently been coined: deposition with recognition. Here, I show how my own data has been recognised:

There are various other ways as well, and perhaps I will leave this to another post. To return to the chemistry (where we should have been at the start). I ran the calculation (B3LYP+D3/TZVP) and published the newly enhanced data, citing it in the usual way.[cite]10.6084/m9.figshare.1111454[/cite],[cite]10.14469/ch/24884[/cite] To answer my question, for the D2d geometry, B40 has 24 π-electrons (there is some ambiguity, it could be 26). On average, the boron retains only ~0.65s, balanced by ~2.35p electrons. The most stable π-pair is shown below. At the centre of the ring is a strongly diatropic ring current (NICS = -42 ppm)[cite]10.6084/m9.figshare.1111518[/cite] suggesting aromaticity (26 electrons = 4n+2).


I conclude by pondering whether the properties of any such boron cluster may in time prove to be directly related to the number of σ-to-π promotions.

Sadly, line breaks in lists of atom coordinates date back to an era of about 50 years ago when text files were first treated differently from binary files. Three different “standards” emerged for specifying a line break (DOS, Mac and Unix) in a text file and much confusion has there been ever since when moving these text files across operating systems. The modern way of doing it is to make line breaks redundant by instead marking up the file. The standard chemical markup, invented in 1996, and formally published in 1999[cite]10.1021/ci990052b[/cite], is CML. You will find such CML coordinates in the deposited data from this calculation.[cite]10.6084/m9.figshare.1111454[/cite] You will not have any problems with line breaks!

Publication assigns a DataCite DOI. This takes about 48 hours to propagate to CrossRef, which is here used by the KCite WordPress plugin to retrieve the metadata and compose a citation. If KCite queries CrossRef before the metadata has propagated, it does not generate a citation. If you are reading this and see no citation, please revisit after 48 hours have elapsed.

The diatropicity is inverted to paratropicity (NICS = +28 ppm) when two electrons are removed to create the dication.[cite]10.6084/m9.figshare.1111534[/cite] This inversion is normally a good test of aromaticity/antiaromaticity.

Tags: , , , , , , ,

2 Responses to “Data nightmares: B40 and counting its π-electrons”

  1. Qadir Timerghazin says:

    In situations like that, the fastest way is to simply make a screen grab (or save the file as an image, e.g. png) and then OCR it in Acrobat. Although it does sound rather stupid, it always works! You get the text almost exactly as it appears in the source—some sort of reverse WYSIWYG…

  2. Henry Rzepa says:

    Personally, I would never rely on data generated by OCR. It only takes a single digit to be incorrectly recognised for the data to be essentially worthless.

    As a solution, it might be a very temporary expedient, but I doubt anyone should be forced to rely upon it.

    The problem really lies elsewhere, and it should be fixed elsewhere.

Leave a Reply