Personal web pages on digital repositories.

The university sector in the UK has quality inspections of its research outputs conducted every seven years, going by the name of REF or Research Excellence Framework. The next one is due around 2020, and already preparations are under way! Here I describe how I have interpreted one of its strictures; that all UK funded research outputs (i.e. research publications in international journals) must be made available in open unrestricted form within three months of the article being accepted for publication, or they will not be eligible for consideration in 2020.

At the outset, I should say that one infrastructure to help researchers adhere to the guidelines is being implemented in the form of the Symplectic system. This allows a researcher to upload the final accepted version of a manuscript. At Imperial College, a digital repository called Spiral serves this purpose and also acts as the front end for collecting informative metadata to enhance discoverability. The final accepted version is then converted by the publisher into a version-of-record. This contains styling unique to the publisher and the content is subjected to further scrutiny by the authors as proof corrections. In an ideal world, these latter changes should also be faithfully propagated back to the final accepted version, as would all the supporting information associated with the article. Since most authors do not exactly enjoy the delights of proof corrections, this final reconciliation of the two versions may not always be assiduously undertaken.

I became concerned about the existence of two versions of any given scientific report and that the task of ensuring total fidelity in the content of both versions may negatively impact on the author’s time. Much better if the publisher could grant permission for the author to archive the version-of-record into a digital repository.

Some experiments were needed, and I decided to start them in reverse, by archiving my oldest publications. Since Symplectic now provides a system to do this, I began by using it. Symplectic identifies each publisher’s policies for archival, of which the most liberal are known as ROMEO GREEN. To quote from the definition, this colour allows the author to “archive pre-print and post-print or publisher’s version/PDF“. In an afternoon I had processed most of my ROMEO green articles. You know how it is sometimes, you do not read the fine print! And so the library soon informed me that archival of ROMEO GREEN was in fact only permitted on the author’s “personal web page”. Spiral, as an institutional repository, does not apparently constitute a personal web page for me and so none of my Symplectic submissions could be accepted for archival there.

Time to rethink the experiment. Firstly, I very much wanted the reprints to be held by a proper digital repository rather than a conventional web page. Why? I wanted my reprints to adhere as much as possible to FAIR: findable, accessible, interoperable and re-usable. Well, at least the first two of those (the last two relate more to data). A repository is designed to hold metadata in a formal and standards-based manner and metadata helps achieve FAIR. So I asked the Royal Society of Chemistry (as a ROMEO GREEN publisher) whether a personal web page hosted on a digital repository would qualify. I was soon informed  that I had proposed a neat solution here, and they couldn’t see an issue.

Now, all I had to do is find a repository where I could create such a personal web page. The chemistry department at Imperial College has for ten years hosted a DSpace repository called SPECTRa[1] which already has the functionality for individuals to create personal collections. I had also picked up on the increasing attention being given to Zenodo, like the World-Wide Web itself an offshoot of CERN (of large Hadron Collider fame) and born from the need for researchers to more permanently archive the outputs of their researches. These outputs include software, videos, images, presentations, posters, publications and (most obviously for CERN) datasets. I thought I would include them in my experiment as well. There results are summarised below.

DSpace-SPECTRa Zenodo
Community Henry Rzepa personal web page
reprint collection
Rzepa personal computational
chemistry data and reprint page
Collection Royal Society of Chemistry reprints
Publication 10042/195577 10.5281/zenodo.18758[2]
Thesis 10044/1/20860[3] 10.5281/zenodo.18777[4]
Dataset 10.14469/ch/191342[5] 10.5281/zenodo.18632[6]
Harvesting OAI-ORE OAI-PMH

The last line of this table includes a link to another design feature of a repository, facilitating the ability to harvest the content. The ContentMine project (“The right to read is the right to mine!“) has shown how such harvesting of facts from the literature can be automated on a vast scale, and (IMHO) represents an example of those disruptive innovations that have the power to change the world forever. It also enshrines the idea that scientific facts funded by the public purse should be capable of being openly liberated from their containers. A harvestable repository seems an ideal container for achieving this.

My experiment is part of what might be seen as the increasingly subtle interplay between:

  • scientific authors, whose creative endeavour research is and without whom scientific publishers would not exist
  • publishers who create a business model from the content freely given them by authors but also (especially if a commercial publisher) need to be accountable to their shareholders.
  • the funding councils, many of whom now wish the outcomes of the research they fund to be openly available to all
  • the local libraries/administrators who have to adhere to/enforce all the rules contractually handed down to them by publishers whose direct customers they are, but who also need to serve their community of readers and authors.
  • researchers who would rather do research than fret about the above, and who would rather spend limited resources doing that research rather than diverting an increasing amount of their attention into the above system.
  • readers, who need unimpeded access to the research endeavours of others, but often have little influence on the policies and actions of all the other stakeholders, since they are  NOT considered customers (of the publishers).
  • etc. etc.

My experiment was in part designed to explore these rules, their interpretations and their boundaries. For the time being at least I seem to have found an arrangement that allows me to distribute versions-of-record of my own work, thanks to a generous and far-sighted learned society publisher. Watch this space!


Acknowledgments

This post has been cross-posted in PDF format at Authorea.

References

  1. J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, "SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories", Journal of Chemical Information and Modeling, vol. 48, pp. 1571-1581, 2008. http://dx.doi.org/10.1021/ci7004737
  2. Rzepa, Henry S.., and Challis, Brian C.., "The mechanism of diazo-coupling to indoles and the effect of steric hindrance on the rate limiting step", 1975. http://dx.doi.org/10.5281/zenodo.18758
  3. H.S. Rzepa, "Hydrogen transfer reactions of indoles", 1974. http://doi.org/10044/1/20860
  4. Rzepa, Henry S.., "Hydrogen transfer reactions of Indoles", 1974. http://dx.doi.org/10.5281/zenodo.18777
  5. Henry S Rzepa., "C 25 H 34 Cl 1 N 3 O 1", 2015. http://dx.doi.org/10.14469/ch/191342
  6. Rzepa, Henry S.., Lobo, Ana, S.., Andrade, Marta S.., Silva, Vanessa S.., and Lourenco, Ana M.., "Chiroptical properties of streptorubin B – the synergy between theory and experiment.", 2015. http://dx.doi.org/10.5281/zenodo.18632
Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

Recent Posts

Detecting anomeric effects in tetrahedral boron bearing four oxygen substituents.

In an earlier post, I discussed a phenomenon known as the "anomeric effect" exhibited by…

2 days ago

Internet Archeology: reviving a 2001 article published in the Internet Journal of Chemistry.

In the mid to late 1990s as the Web developed, it was becoming more obvious…

1 month ago

Detecting anomeric effects in tetrahedral carbon bearing four oxygen substituents.

I have written a few times about the so-called "anomeric effect", which relates to stereoelectronic…

2 months ago

Data Citation – a snapshot of the chemical landscape.

The recent release of the DataCite Data Citation corpus, which has the stated aim of…

2 months ago

Mechanistic templates computed for the Grubbs alkene-metathesis reaction.

Following on from my template exploration of the Wilkinson hydrogenation catalyst, I now repeat this…

2 months ago

3D Molecular model visualisation: 3 Million atoms +

In the late 1980s, as I recollected here the equipment needed for real time molecular…

3 months ago