Some examples of open access publications citing managed research data (RDM).

In May 2015, the EPSRC funding council in the UK required researchers to publish the outcomes of the funded work to include an OA (open access) version of the narrative and to cite the managed research data used to support the research with a DOI (digital object identifier). I was discussing these aspects with a senior manager (research outcomes) at the EPSRC and he asked me to provide some examples from my area of chemistry; here are some.

The basics are covered by three broad actions:

  1. The researcher should adopt a research data management plan. This can be quite brief, but it is important that it be updated as the strategies evolve with time and that it is consistent within the group (ideally the department).

    • It would include a general policy for the research group to access and if appropriate share a common, private, storage area for the so-called "active data" (data still being analysed and processed). It could for example take the form of cloud storage, using commercial providers such as Box, DropBox or GitHub. The data is accessible only to those who have been granted access.
    • It could be a software organizer in which cloud storage is implicit. For quantum calculations, we use a locally developed system for the purpose which serves a storage function and which has one other even more important attribute in functioning as a generator and collector of metadata associated with the datasets being generated.[1],[2],[3]
  2. The narrative describing the research is then published as an OA article, conjointly with …
  3. … the datasets being published to a data repository, and assigned a DOI.
  4. There are some delicate aspects of ensuring that actions 2 and 3 are synchronised, ensuring that the article cites the data and that the data cites the article. I will not here detail the mechanisms for achieving this.

What follows here are 11 examples of OA articles in which managed data is cited in the manner decribed at the start. You may notice a diversity of styles and procedures. At the time most of these examples were being worked upon, there were few examples or indeed guidelines, and so these really constitute an exploration of various ways in which it can be done.



Article short DOI



Data is cited as:
[4] 9qg [5]
  • Additional file 1. Interactivity box 1. Data-based object illustrating various aspects of the interaction at the heart of Z-DNA.
  • A footnote in the preceding object: The original complete data set is also available at via a digital repository.
[6] 9qf [7] Interactive Table S1, using dataDOIs referencing a data repository, as eg and 11 other examples.
[8] 9p3 [7]
  • Full details of all calculations are available via the individual digital repository entries associated with Interactivity Boxes 1 and 2 (Web enhanced objects) available with this article (doi 10.6084/m9.figshare.797484, shortdoi: rns) or directly by the following doi resolvers: TS1, 10042/to-13699 and ~40 further entries
  • supplemental data: 10.6084/m9.figshare.777773, shortdoi: rnf.
[9] 9p2 [10] Ref 20 and 21 as an Interactivity box, datadoi: 10.6084/m9.figshare.785756, shortdoi: n6q and further references to individual datasets are available in this object.
[11] 9p4 [12]
  • Each data table or data Figure is assigned a doi in the Figshare repository (see footnotes), all retrievable as e.g. shortdoi: qd8.
  • Each figure or table contains further data citations (~20 per table).
[13] 9p5 [14]
  • footnotes to individual tables (Table 5, Table 7, Table 9)
  • and in the section Associated content at the end of the article, citing Interactive Tables 1, which themselves cite further datadois.
[1] vf4   This article discusses the technology behind five examples of articles which themselves contain citations to data.
[15] 9p6 [16] Refs 17 (doi: 10.6084/m9.figshare.988346, shortdoi: tb3) and 18 (doi: 10.6084/m9.figshare.1293562, shortdoi: znk)
[2] 73z [17] References 27 (10.6084/m9.figshare.1266197, shortdoi: xn3) and 28 (10.6084/m9.figshare.1342036, shortdoi: 2zb).
[18] 9p9 [19] Ref 15, in the form: An interactive table corresponding to the data for these calculations and the experimental details can be retrieved from doi:10.6084/m9.figshare.1181739, shortdoi: vz9. NCI surfaces were created using the resource doi:10.6084/m9.figshare.811862, shortdoi: n5b.
[3] 73x [20] Refs 36 (doi:10.6084/m9.figshare.1342036, shortdoi: 2zb) and ref 50 (10.6084/m9.figshare.1330063, shortdoi:6cq).

I hope this table adds to the open collection of pointers linking open access research articles to associated managed data. One really requires this association to be achieved using metadata and perhaps something along these lines might emerge quite soon from the fruits of the current collaborations between CrossRef and DataCite. Ideally, one should be able to pose search queries along the lines of identifying all research data associated with an article, and indeed vice versa.

When the scientific journal arose some 350 years ago, the format and presentation of the narrative evolved only relatively slowly, an evolution that has accelerated somewhat in the online era largely due to the author guidelines imposed by the publishers. I suspect most authors were happy to allow the publishers to take control of this aspect. There may now however be a similar expectation that the publishers specify how authors' data is managed and presented. I would however argue here that it is the authors themselves who know the attributes of their data best and the 11 examples above show one evolutionary process of the data publication process which in this instance was largely determined by the authors themselves. We should strive to allow the authors to retain these measures of creativity in the future, as RDM and its integration into journals matures and develops.

Interactive tables here were created as convenient collections of dataset DOIs, and have been presented in conjunction with visualisation software such as Jmol or JSmol. These tables can themselves be published in a repository and assigned a DOI. Most of the examples we prepared were published in the Figshare repository (the DOIs for some of which are shown in the last column of the table above). Special actions had to be taken at the Figshare end to allow the tables to be incorporated into the landing page presentation corresponding to the DOI. In December 2015, the site was refactored and this functionality is currently disabled, but should be restored in the near future.

 If anyone reading this post is aware of interesting chemistry examples illustrating formal data citation of managed research data using e.g. a DOI in published articles, do please let me know and if appropriate I will add them to the table above.



  1. M.J. Harvey, N.J. Mason, and H.S. Rzepa, "Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks", Journal of Chemical Information and Modeling, vol. 54, pp. 2627-2635, 2014.
  2. M.J. Harvey, N.J. Mason, A. McLean, and H.S. Rzepa, "Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers", Journal of Cheminformatics, vol. 7, 2015.
  3. M.J. Harvey, N.J. Mason, A. McLean, P. Murray-Rust, H.S. Rzepa, and J.J.P. Stewart, "Standards-based curation of a decade-old digital repository dataset of molecular information", Journal of Cheminformatics, vol. 7, 2015.
  4. H.S. Rzepa, "Chemical datuments as scientific enablers", Journal of Cheminformatics, vol. 5, 2013.
  5. Henry S Rzepa., "C 19 H 28 N 9 O 10 P 1", 2012.
  6. M.J. Gomes, L.F. Pinto, P.M. Glória, H.S. Rzepa, S. Prabhakar, and A.M. Lobo, "N-heteroatom substitution effect in 3-aza-cope rearrangements", Chemistry Central Journal, vol. 7, pp. 94, 2013.
  7. Henry S Rzepa., "C 11 H 16 N 1 O 5 -1", 2011.
  8. F.L. Cherblanc, Y. Lo, W.A. Herrebout, P. Bultinck, H.S. Rzepa, and M.J. Fuchter, "Mechanistic and Chiroptical Studies on the Desulfurization of Epidithiodioxopiperazines Reveal Universal Retention of Configuration at the Bridgehead Carbon Atoms", The Journal of Organic Chemistry, vol. 78, pp. 11646-11655, 2013.
  9. D. Christopher Braddock, J. Clarke, and H.S. Rzepa, "Epoxidation of bromoallenes connects red algae metabolites by an intersecting bromoallene oxide – Favorskii manifold", Chemical Communications, vol. 49, pp. 11176, 2013.
  10. Henry S Rzepa., "C 6 H 9 Br 1 O 2", 2013.
  11. A. Armstrong, R.A. Boto, P. Dingwall, J. Contreras-García, M.J. Harvey, N.J. Mason, and H.S. Rzepa, "The Houk–List transition states for organocatalytic mechanisms revisited", Chem. Sci., vol. 5, pp. 2057-2071, 2014.
  12. Nicholas Mason., and Nicholas Mason., "C 18 H 23 N 1 O 3", 2013.
  13. S. Lal, H.S. Rzepa, and S. Díez-González, "Catalytic and Computational Studies of N-Heterocyclic Carbene or Phosphine-Containing Copper(I) Complexes for the Synthesis of 5-Iodo-1,2,3-Triazoles", ACS Catalysis, vol. 4, pp. 2274-2287, 2014.
  14. Henry S Rzepa., "C 15 H 12 I 1 N 3", 2011.
  15. K.K.(. Hii, H.S. Rzepa, and E.H. Smith, "Asymmetric Epoxidation: A Twinned Laboratory and Molecular Modeling Experiment for Upper-Level Organic Chemistry Students", Journal of Chemical Education, vol. 92, pp. 1385-1389, 2015.
  16. Henry S Rzepa., "C 21 H 32 O 1 S 2", 2015.
  17. Henry S. Rzepa., Nick Mason., Andrew Mclean., and Matt Harvey., "Interoperability for Data Repositories. Machine Methods for Retrieving Data for Display or Mining Utilising Persistent (data-DOI) Identifiers", 2014.
  18. T. Lanyon-Hogg, M. Ritzefeld, N. Masumoto, A.I. Magee, H.S. Rzepa, and E.W. Tate, "Modulation of Amide Bond Rotamers in 5-Acyl-6,7-dihydrothieno[3,2-c]pyridines", The Journal of Organic Chemistry, vol. 80, pp. 4370-4377, 2015.
  19. Henry S Rzepa., "C 15 H 15 N 1 O 1 S 1", 2014.
  20. Henry S. Rzepa., Matthew J. Harvey., Nicholas J. Mason., Andrew Mclean., Peter Murray-Rust., and James J. P. Stewart., "Standards-based curation of a decade-old digital repository dataset of molecular information.", 2015.

Leave a Reply