Five things you did not know about (fork) handles.

OK, you have to be British to understand the pun in the title, a famous comedy skit about four candles. Back to science, and my mention of some crystal data now having a DOI in the previous post. I thought it might be fun to replicate the contents of one of my ACS slides here.

Firstly, a DOI is one implementation of a more generic (and quite old) concept known as a Handle. This is one form of a persistent digital identifier. Article DOIs have been in common use for at least ten years now, and even new chemistry students know about them!^‡ A DOI points to an article in a journal? Not quite as it happens, but in fact it could be a whole lot more that a DOI could lead to! Let me explain by showing you five examples:

doi.org/10042/26065 resolves to a landing page. Crucially, this is NOT the article itself, which may remain obstinately behind a paywall to which you have no access.
doi.org/10042/26065?locatt=filename:input.gjf resolves to a file input.gjf that may be present off the landing page, and hence allowing a machine action to retrieve it.
doi.org/10042/26065?locatt=mimetype:chemical/x-gaussian-input resolves to the first file matching the MIME type that may be present off the landing page, and hence allowing a machine action to retrieve it.
doi.org/10042/26065?locatt=id:1 resolves to the first file matching ID=1 that may be present off the landing page, and hence allowing a machine action to retrieve it.
doi.org/api/10042/26065 will return the JSON-encoded full handle record for processing in Javascript, so that a machine now has access to all the information it might need to perform a machine action.

Now, items 2-5 are not generally available; they work only on our servers. We have placed them there to show how item 6 of the Amsterdam Manifesto could be made to work. There are other ways of course. But you can see them in action here[1] (the article is open access, so you should not get any paywall behaviour from the landing page).

^‡Postscript. A few days ago, I asked my group of 1st year undergraduate students how they might go about tracking down a journal article from its authors, the journal name and the page numbers. The most common reply was “Google it”. Next came “go to the library and find it on the shelves”. One replied “from its DOI” (that student had done an internship in a pharma company before joining us). I used to teach a chemical information course here[2] between 1996 – 2010 where this sort of stuff was a staple. That course is no longer taught. Hence the aforementioned replies!

Author

Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

View all posts

References

A. Armstrong, R.A. Boto, P. Dingwall, J. Contreras-García, M.J. Harvey, N.J. Mason, and H.S. Rzepa, "The Houk–List transition states for organocatalytic mechanisms revisited", Chem. Sci., vol. 5, pp. 2057-2071, 2014. https://doi.org/10.1039/c3sc53416b

Tags: ACS, DOI, Google, Handle, JSON, LOCATT

This entry was posted on Tuesday, March 18th, 2014 at 4:23 pm and is filed under Chemical IT. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Responses to “Five things you did not know about (fork) handles.”

Nicholas Mason says:

March 20, 2014 at 8:49 pm

Regarding ‡PS: an extremely useful tool for tracking down journal articles from references is Oleksandr Zhurakovskyi’s Chemistry Reference Resolver web app.

Well worth bookmarking!

http://chemsearch.kovsky.net

Reply
Henry Rzepa says:

March 21, 2014 at 12:45 pm

Of course, specifying the literature by “its authors, the journal name and the page numbers” sounds semantically simple, but it is a syntactical nightmare. There have never been totally unambiguous rules for how to specify an author(s) [initial first, last?], how to abbreviate a journal, whether a page range is mandatory or not, etc. Google has specialised in the fuzzy search, where these syntactical ambiguities are factored in.

This is why the Handle system (and the specific implementation of the DOI) is so important. But yes, few people remember a DOI; humans can cope with the fuzzy journal notation rather better. Still, the citation I gave my students, “A. Lapworth, J. Chem. Soc. Trans, 1903, 995-1005” does not resolve using the above tool. I guess its too old.

Reply
Alex Zhurakovskyi says:

March 30, 2014 at 2:31 am

Dear Nicholas, thanks a lot for your kind words!

Dear Prof. Rzepa, your reference is resolved now. Of course, one drops out the author(s) name(s) when searching (e.g. “J. Chem. Soc. Trans, 1903, 995-1005” or “jcst 83 995”, or any other supported format). See http://chemsearch.kovsky.net/help.php for info. Most of the times, several abbreviations for a given journal are supported: http://chemsearch.kovsky.net/supported_journals.php

I did not know of the existence of J. Chem. Soc. Trans., and thus haven’t added it until now. As you properly pointed out, old publications are sometimes tangled (how many Berichte’s are out there?)

Reply
Henry Rzepa says:

March 30, 2014 at 2:46 pm

I find an interesting symmetry between the two processes

Molecule name ↔ Molecule structure ↔ Molecule (InChI) ID

and

Article name ↔ Researcher name (ID) ↔ Article ID.

In both cases, the name is often fuzzy/incomplete and probably not canonical, unlike the digital ID. In both cases, the metadata is only partially formalised.

For this blog, I actually go from article DOI to Article name using a plugin called Kcite, which in turn uses the CrossRef API to query against the DOI for all the metadata associated with it. That API can be found at http://help.crossref.org/. It is liberating not to have to spend so much time looking up the conventional citation metadata (but annoying that the formatting of that data is beyond my control).

But it is also highly useful to go in the opposite direction, from metadata to ID, which is what http://chemsearch.kovsky.net does

The above also alludes to the need to disambiguate a researcher (ID). This is an interesting challenge, partially addressed via e.g. ORCID (my ORCID ID is 0000-0002-8635-8390), although it is not in the least obvious how this will be done retrospectively, i.e. for Arthur Lapworth.

Reply
ana lobo says:

March 30, 2014 at 7:04 pm

liked it very much.

Reply

Henry Rzepa's Blog