A Digital chemical repository – is it being used?

In this previous blog post I wrote about one way in which we have enhanced the journal article. Associated with that enhancement, and also sprinkled liberally throughout this blog, are links to a Digital Repository (if you want to read all about it, see DOI: 10.1021/ci7004737). It is a fairly specific repository for chemistry, with about 5000 entries. These are mostly the results of quantum mechanical calculations on molecules (together with a much smaller number of spectra, crystal structure and general document depositions). Today, with some help (thanks Matt!), I decided to take a look at how much use the repository was receiving.

  1. The first entry in the log dates from 2008-02-05.
  2. The repository is now receiving about 1200 accesses via handle resolutions each day, which comprises
  3. ~150 unique client IPs, and
  4. ~900 unique handles accessed daily

Whilst most of the hits are coming from web spiders by auto-discovery, a fair number (perhaps ~300) of the 5000 entries have also been linked to via journal articles, and of course this blog, and some hits may be presumed to be the result of non-random ping-backs. A breakdown of a typical day (2010-02-10) when 839 unique handles were accessed shows access by, amongst others, five universities, Google/Yahoo, several other information corporations and Microsoft. I had no idea Microsoft was interested in calculations on molecules! You saw that here first!!

Other anecdotal feedback regarding the repository: I often use it to exchange calculations with collaborators, sending them the handle instead of a vast checkpoint or log file. Some collaborators, it has to be said are baffled by the interface presented to them (which was designed in large measure by DSpace, not by us).

It is early days in many ways, and being pretty much the only standards-compliant digital repository operating in chemistry in this manner means that awareness is still low. If anyone reading this blog knows of significant others, please comment.

  1. Tobias Kind says:

    Hi Henry,
    interesting access statistics, but I usually don’t trust those URLs
    with port numbers, for fear they are like the GeoCities of the internets:
    soon to be extinct. (I have to disclose we offer similar bad URLs).


    spectradspace.lib.ic.ac.uk:8443 uses an invalid security certificate.
    The certificate is only valid for spectradspace.lib.imperial.ac.uk
    (Error code: ssl_error_bad_cert_domain)

    Now to your question, is it being used??
    I knew the concept and I cited the paper in our publication
    “How large is the metabolome? A critical analysis of data exchange practices in chemistry”

    The funny part is, I did not know that the website
    exists and that is actually functional, because it could be nowhere found.
    IF you do a Google Search for the URL it returns:
    Results 1 – 7 of 7 for “spectradspace.lib.imperial.ac.uk:8443. (0.23 seconds)

    I agree having a HANDLE or DOI number for each experimental set or spectrum as an eternal
    and citable entity is really cool.

    I *love* the idea of having access to Gaussian log files, spectra results etc,
    because it preserves our earth, I am not kidding, by preventing the 1000ths
    recalculation of the lowest energy conformer of methane with MP2, B3LYP and others.
    It simply saves CPU cycles.

    So I guess its not only about making a nice case, but really about advertisement and
    promotion and making the hopefully new URL known to many researchers (if thats intended).
    Maybe a cloud-like approach can be built with such a setup, because Amazon offers free
    storage for open data sets. Well what happens when Amazon does not exist anymore?

    How about spectra.imperial.ac.uk or spectra.ic.ac.uk?


    PS: not sure about the funny bold and normal font settings in my post
    I didn’t do it…

  2. Henry Rzepa says:

    Hi Tobias,

    If one tries a Google search with the string site:spectradspace.lib.imperial.ac.uk the first 50 or so hits are all to entries in the repository. Google uses these field qualifiers internally, but they are rather poorly documented (a few minutes search did not reveal the complete list, another is for example author:name).

    Last time I checked, there were several thousand DSpace servers out there (but not many in chemistry!). Quite why they are so difficult to find is a mystery.

  3. Henry Rzepa says:

    It proved a challenge to find information on how to control a (Google) search. Thus this page is a useful summary of the syntax, but its a trial to find it. There is also this site which is rather more generic and this page provides the best summary of the syntax, including discussion of stemming, Boolean operators, etc. One that was of interest to me was the field filetype: which might allow searches for files of specific content, ie filetype:cml.

  4. Henry Rzepa says:

    Picking up on another phrase you used Tobias: “having a HANDLE or DOI number for each experimental set or spectrum as an eternal and citable entity is really cool

    This leads to the question of how eternal any individual instance of a DSpace server actually is. Our server was, and still is, real tin. We will however be virtualizing it over the summer so that it does not depend on a specific server box for its function. But taking the eternity metaphor one step further, I have just tried to archive an individual entry from our digital repository into WebCite. This provided the following (hopefully a bit more eternal) link: http://www.webcitation.org/5pUnKxCOM. The individual files in the repository all work! I dont know if it would be acceptable to automatically archive each entry in our DSpace using Webcite though. But this does look promising!

  5. […] blogged about this two years ago and thought a brief update might be in order now. To support the discussions here, I often perform […]

