Blogbooks, e-books and future proofing chemical diagrams.

Most of the chemical structure diagrams in this blog originate from Chemdraw, which seems to have been around since the dawn of personal computers! I have tended to use this program to produce JPG bitmaps for the blog, writing them out in 4x magnification, so that they can be scaled down for display whilst retaining some measure of higher resolution if needed for other purposes. These other purposes might be for e.g. the production of e-books (using Calibre), the interesting Blog(e)book format offered as a service by Feedfabrik, or display on mobile tablets where the touch-zoom metaphor to magnify works particularly well. But bitmap images are not really well future proofed for such new uses. Here I explore one solution to this issue.

I have previously mentioned scalable vector graphics (SVG) as an alternative, and fortunately the production of such has become routine.3 The diagram above2 is indeed SVG (and if you cannot see it, then try a modern SVG-capable browser1). It was produced thus:

  1. Drawn in Chemdraw
  2. Exported as Encapsulated postscript
  3. Imported into  Scribus, an Open Source desktop publishing program (where it can be annotated/edited if need be)
    • This program will also need Ghostscript installed to handle the EPS
  4. and exported from Scribus to SVG.
  5. Notice how the diagram above automatically scales to fill the width of the page. If you click on it, you get the diagram on its own. If you zoom the browser window, it should scale perfectly.
  6. I note that these SVG diagrams work well in e-books or blogbooks.
There seem to be many other (open) programs out there which support SVG, so the above combination is not necessary the only one, or indeed the best. There is one other aspect which might be mentioned. The old GIF or JPG bitmap formats do have good meta-data support, such as  EXIF, GPS or XMP. These invisible data have often been used to embed a molecular connection table into a GIF or JPG file, such that the original molecular data can be reconstituted from the image file. Unfortunately, there are no real standards for doing this, and so round-tripping the data is probably a closed process within a specific software environment. However, because SVG is an XML format, it can be readily made to carry such information in a fully inter-operable manner. For example, one could easily embed a CML description of the molecule into its own container (namespace) in the SVG file. For the purposes of rendering an on-screen image, this extra information is of course ignored.

1 I notice that Internet Explorer 9 (both 32- and 64-bit versions) will display (and save) the above diagram if you click on it, but it cannot (yet) be inlined into the post, although the documentation implies it should.
2 The version below is the conventional JPG form (click on it to see the original 4x version).

Diagram displayed using JPG.

3. Historical note. Peter Murray-Rust and I have been promoting SVG for use in chemistry for 11+ years now. For one ancient page, see here. The syntax has decayed somewhat, but some of the diagrams still work!

Tags: , , , , , , , , , , , ,

13 Responses to “Blogbooks, e-books and future proofing chemical diagrams.”

  1. Michael Banck says:

    Hrm, maybe you could’ve published the original ChemDraw JPG for side-by-side examination. The carbonyl group looks kinda odd to me (too long double bond, to small oxygen “O”), but maybe I am just making it up.

  2. Henry Rzepa says:

    As per request, see above. No re-touching! I am reminded of the story about Napier’s logarithms, whereby he inserted small deliberate (and inconsequential) errors into his log tables to catch out people who might have been inclined to re-publish them. There are rumours that chemical structure drawing programs also have tiny features (mountweazels) which were used to detect anyone who was publishing using the program, but without a valid license. But these are just stories, and probably untrue.

  3. Hi Henry,

    You may already know this, but ChemDoodle can output directly to SVG and several other graphical standards. In addition, ChemDoodle can create HTML5 ChemDoodle Web Components for dynamic chemical content; see Interactive chemistry ebooks: interactive figures

    Many publishers and formats are now embracing HTML5, such as Amazon (Amazon introduces new HTML5-based eBook format), and we should start seeing WebGL on the iPad and other mobile devices soon, so it is definitely worth the effort to invest in these new technologies.

    In the next update, ChemDoodle SVG output will contain chemical metadata, so that round-tripping that information will be possible. Even if other providers will not be supporting technologies like SVG, we certainly will be.

  4. Henry Rzepa says:

    Greetings Kevin.

    Yes, regarding Amazon, I did blog this and also ChemDoodle.

    As for the round tripping, do please use CML. As I see it, you currently express data for ChemDoodle via line escaped Molfiles (the need to insert \n at the end of each line of a mol file only illustrates how old that format actually is!)

  5. Henry Rzepa says:

    By the way, there is a very impressive demo of Jmol on an Android tablet, produced by Bob Hanson. I do not have such a beast nearby. Does anyone know if it supports WebGL (and indeed SVG)?

  6. Actually, the ChemDoodle Web Components library reads many formats, including CML. We use MOL files for the default instructions because they are typically smaller in size. The use of the \n would be necessary for CML input as well, because the Javascript string cannot contain new lines. Of course, we are also open to new suggestions for handling CML.

    ChemDoodle desktop reads and writes CML (v1 and v2).

    I think the important note though, is that if you want to see these technologies adopted and used, then you have to support software that supports them, especially if the software you have used for decades will not.

  7. Henry Rzepa says:

    Actually, removing the line breaks from a CML file should NOT break it (unlike a Molfile). Shall we try the experiment?

    <?xml version="1.0"?> <molecule id="output-0" xmlns="http://www.xml-cml.org/schema/cml2/core"> <atomArray> <atom id="a1" elementType="C" x3="2.154551" y3="-1.223953" z3="-0.265703"/> <atom id="a2" elementType="C" x3="0.753566" y3="-1.801514" z3="0.090120"/> <atom id="a3" elementType="C" x3="-0.461726" y3="-0.957541" z3="-0.321185"/> <atom id="a4" elementType="C" x3="-1.283312" y3="-0.272995" z3="0.765778"/> <atom id="a5" elementType="C" x3="-1.767113" y3="1.162610" z3="0.428225"/> <atom id="a6" elementType="C" x3="1.629275" y3="1.017421" z3="-0.107122"/> <atom id="a7" elementType="C" x3="1.731901" y3="0.774764" z3="-1.600684"/> <atom id="a8" elementType="C" x3="2.178057" y3="-0.705069" z3="-1.718314"/> <atom id="a9" elementType="C" x3="2.572371" y3="0.031705" z3="0.565004"/> <atom id="a10" elementType="C" x3="2.396019" y3="-0.135033" z3="2.071607"/> <atom id="a11" elementType="C" x3="4.051670" y3="0.370229" z3="0.303652"/> <atom id="a12" elementType="C" x3="-2.935168" y3="1.180731" z3="-0.564237"/> <atom id="a13" elementType="C" x3="-4.087352" y3="0.276615" z3="-0.129020"/> <atom id="a14" elementType="C" x3="-3.599710" y3="-1.153707" z3="0.097511"/> <atom id="a15" elementType="C" x3="-2.478915" y3="-1.181702" z3="1.135010"/> <atom id="a16" elementType="C" x3="0.673154" y3="1.686118" z3="0.534633"/> <atom id="a17" elementType="C" x3="-0.633134" y3="2.115072" z3="-0.054920"/> <atom id="a18" elementType="O" x3="-0.806406" y3="-0.924126" z3="-1.483649"/> <atom id="a19" elementType="H" x3="2.877636" y3="-2.031950" z3="-0.111274"/> <atom id="a20" elementType="H" x3="0.653305" y3="-2.739587" z3="-0.463536"/> <atom id="a21" elementType="H" x3="0.703530" y3="-2.042144" z3="1.154892"/> <atom id="a22" elementType="H" x3="-0.651644" y3="-0.211668" z3="1.653975"/> <atom id="a23" elementType="H" x3="-2.144001" y3="1.553104" z3="1.384663"/> <atom id="a24" elementType="H" x3="0.791189" y3="0.949396" z3="-2.119395"/> <atom id="a25" elementType="H" x3="2.471708" y3="1.454182" z3="-2.035288"/> <atom id="a26" elementType="H" x3="1.502960" y3="-1.273083" z3="-2.357217"/> <atom id="a27" elementType="H" x3="3.181857" y3="-0.789508" z3="-2.138493"/> <atom id="a28" elementType="H" x3="2.705979" y3="0.772307" z3="2.598003"/> <atom id="a29" elementType="H" x3="1.366113" y3="-0.346907" z3="2.360735"/> <atom id="a30" elementType="H" x3="3.022580" y3="-0.956352" z3="2.432722"/> <atom id="a31" elementType="H" x3="4.327757" y3="1.271964" z3="0.856829"/> <atom id="a32" elementType="H" x3="4.698052" y3="-0.446018" z3="0.641682"/> <atom id="a33" elementType="H" x3="4.266754" y3="0.553256" z3="-0.749830"/> <atom id="a34" elementType="H" x3="-3.285191" y3="2.212516" z3="-0.680307"/> <atom id="a35" elementType="H" x3="-2.566915" y3="0.849002" z3="-1.539191"/> <atom id="a36" elementType="H" x3="-4.527481" y3="0.661850" z3="0.800665"/> <atom id="a37" elementType="H" x3="-4.879342" y3="0.292671" z3="-0.883291"/> <atom id="a38" elementType="H" x3="-4.423211" y3="-1.791052" z3="0.433751"/> <atom id="a39" elementType="H" x3="-3.236690" y3="-1.566727" z3="-0.849022"/> <atom id="a40" elementType="H" x3="-2.874651" y3="-0.831938" z3="2.096430"/> <atom id="a41" elementType="H" x3="-2.128304" y3="-2.206799" z3="1.299003"/> <atom id="a42" elementType="H" x3="0.722108" y3="1.726402" z3="1.621251"/> <atom id="a43" elementType="H" x3="-0.897656" y3="3.139787" z3="0.224670"/> <atom id="a44" elementType="H" x3="-0.590004" y3="2.087801" z3="-1.145366"/> </atomArray> <bondArray> <bond atomRefs2="a26 a8" order="1"/> <bond atomRefs2="a27 a8" order="1"/> <bond atomRefs2="a24 a7" order="1"/> <bond atomRefs2="a25 a7" order="1"/> <bond atomRefs2="a8 a7" order="1"/> <bond atomRefs2="a8 a1" order="1"/> <bond atomRefs2="a7 a6" order="1"/> <bond atomRefs2="a35 a12" order="1"/> <bond atomRefs2="a18 a3" order="2"/> <bond atomRefs2="a44 a17" order="1"/> <bond atomRefs2="a37 a13" order="1"/> <bond atomRefs2="a39 a14" order="1"/> <bond atomRefs2="a33 a11" order="1"/> <bond atomRefs2="a34 a12" order="1"/> <bond atomRefs2="a12 a13" order="1"/> <bond atomRefs2="a12 a5" order="1"/> <bond atomRefs2="a20 a2" order="1"/> <bond atomRefs2="a3 a2" order="1"/> <bond atomRefs2="a3 a4" order="1"/> <bond atomRefs2="a1 a19" order="1"/> <bond atomRefs2="a1 a2" order="1"/> <bond atomRefs2="a1 a9" order="1"/> <bond atomRefs2="a13 a14" order="1"/> <bond atomRefs2="a13 a36" order="1"/> <bond atomRefs2="a6 a16" order="1"/> <bond atomRefs2="a6 a9" order="1"/> <bond atomRefs2="a17 a43" order="1"/> <bond atomRefs2="a17 a5" order="1"/> <bond atomRefs2="a17 a16" order="1"/> <bond atomRefs2="a2 a21" order="1"/> <bond atomRefs2="a14 a38" order="1"/> <bond atomRefs2="a14 a15" order="1"/> <bond atomRefs2="a11 a9" order="1"/> <bond atomRefs2="a11 a32" order="1"/> <bond atomRefs2="a11 a31" order="1"/> <bond atomRefs2="a5 a4" order="1"/> <bond atomRefs2="a5 a23" order="1"/> <bond atomRefs2="a16 a42" order="1"/> <bond atomRefs2="a9 a10" order="1"/> <bond atomRefs2="a4 a15" order="1"/> <bond atomRefs2="a4 a22" order="1"/> <bond atomRefs2="a15 a41" order="1"/> <bond atomRefs2="a15 a40" order="1"/> <bond atomRefs2="a10 a29" order="1"/> <bond atomRefs2="a10 a30" order="1"/> <bond atomRefs2="a10 a28" order="1"/> </bondArray> </molecule>
    

    The above is all on one line!

  8. Henry Rzepa says:

    Regarding adoption of new technologies, by far the greatest use of programs such as Chemdraw here is for including high quality printable diagrams into Microsoft Word and Powerpoint. Whilst M$’s IE9 browser might support eg HTML5/SVG, I think Office 2011/2012 probably does not.

    Do enlighten me Kevin on how Chemdoodle might be used in such an environment? If it can, lots of people here might be convincible (but in fact most chemists are ultra conservative when it comes to this sort of thing).

  9. First, Henry, I have followed your work since I first became interested in computers in chemistry, and have always been and always will be a huge fan. I take your standards seriously, and we follow them in all of our products (from CML to MIME types).

    You are correct about XML not requiring return lines. I was under the false impression that processing instructions (such as the declaration) must be followed by return lines. So definitely, CML is a very good choice for those that do not want to deal with such syntax issues with the ChemDoodle Web Components.

    About Microsoft Word and Powerpoint, the ChemDoodle Web Components would not be such a great tool, but ChemDoodle desktop certainly is. The ChemDoodle Web Components is a Javascript library for creating HTML5 content for chemistry and other sciences. ChemDoodle desktop is an end user software application for creating chemical graphics and other chemical publishing functions. ChemDoodle is compatible with Microsoft Office, Apple iWork and OpenOffice (and other variants), as well as other third party products like Adobe software. You can paste scalable vector graphics directly into these software suites on Windows, Mac OS X and Linux. On Mac OS X, we also support round-trip editing with most of these applications, including Microsoft Office 2011 and iWork Keynote. We will also be supporting round-trip editing on Windows soon via OLE.

    If you do not mind me stealing a bit of your time, I would be very happy to speak with you some more about your work and the future of this technology.

  10. Henry Rzepa says:

    My previous example of ChemDoodle used a Molfile with the linefeeds replaced by \n, and with the hard wraps removed to enable it to be specified in a Javascript array. We generate CML from a High-performance computing facility linked to a DSpace digital repository, and here the conversion to CML is carried out automatically. It would be trivial to write this file without hard wraps, and thence to use it directly for e.g. ChemDoodle. Would the current version support this directly? One can also insert CML into SVG, and have only the CML namespace read by Chemdoodle (although the array would probably now contain e.g. more than 10,000 characters). It would be fantastic Kevin if you could come up with a demo, which could be routinely used on e.g. this blog.

    The other operation which I think is needed is the ability to extract e.g. the CML from the ChemDoodle sandbox and re-use it elsewhere. One aspect of e.g. the IOS model on iPad is that the data is sandboxed into the app, and it can be quite cumbersome to extract it from there and inject a copy into another sandbox. I gather Android is less restrictive (and perhaps also less secure) in this aspect?

    So is demo of ChemDoodle possible which uses a data array containing CML, and some mechanism for extracting just that data out of the array and depositing it onto a clipboard (or even a file) or other sandbox? Is that breaking all the rules?

  11. Sure, let me put together a demo with CML and the ChemDoodle Web Components. I will put it up when I get a chance tonight.

    There are two ways to deal with CML, since it is XML. The first is by Javascript string, and we would then need a Javascript XML parser. To get around this, we use a webservice to parse the CML as will be shown in the demo. XHR2 is used so that anyone can call this directly from any website. The second way is via the HTML DOM, if you wanted to embed the CML in an SVG element or somewhere else in the DOM. Currently, the ChemDoodle Web Components doesn’t do this, but maybe in the future.

    Any clipboard functions, including copy and paste, are forbidden in Javascript, so a simple way to extract the data is not possible. However, using the XHR2 service, we can save the data to a file format of your choice, and then have the user download it. This should also work on Android, but probably not iOS. I will show this in the demo as well.

  12. […] Henry Rzepa Chemistry with a twist « Blogbooks, e-books and future proofing chemical diagrams. […]

Leave a Reply