Data discoverability

I have written earlier about the Amsterdam Manifesto. That arose out of a conference on the theme of “beyond the PDF“, with one simple question at its heart: what can be done to liberate data from containers it was not designed to be in? The latest meeting on this topic will happen in January 2015 as FORCE2015.

The format is suitably modern, starting with a Hackathon, and then two days of talks, posters and demos. We will be presenting both a talk and a demo. In the spirit of emancipated data, we have placed the latter into a container that is most certainly not a PDF. That demo has been archived, and there assigned a DOI[1] and for good measure transcluded into this post in its entirety. We hope this demonstrates that such “containers” can be usefully moved around to where they might be needed. I should say that the core of this demo is not just the data, but the metadata associated with it. Metadata renders that data discoverable (mineable) and its usage measurable.^†

I hope to report here on anything interesting happening at the FORCE2015 event.

^‡The format of this blog is a tiny bit too narrow for the demo to fit comfortably. Go see it here[1] and “enlarge” the view for a better experience.

^†Full details of this are in preparation.

Author

Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

View all posts

References

H.S. Rzepa, N. Mason, A. Mclean, and M. Harvey, "Interoperability for Data Repositories. Machine Methods for Retrieving Data for Display or Mining Utilising Persistent (data-DOI) Identifiers", 2014. https://doi.org/10.6084/m9.figshare.1266197

This entry was posted on Wednesday, December 17th, 2014 at 10:17 am and is filed under Chemical IT. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Data discoverability”

Louis Maddox says:

December 17, 2014 at 6:33 pm

Love this, it’s a shame the journal didn’t manage to incorporate the dynamic tables in their “rich” HTML version of the publication, outsourcing them to FigShare kind of breaks continuity of reading. Did you ask them about this when you published, or was it a choice to separate them from the article?

I know Elsevier have been making movements towards modernising their articles but don’t know how open to innovation editors are in general.

Nice use of content negotiation anyway. FYI, the embedded figshare iframe fits on your blog’s theme nice and neatly if you give the parent <p> element a negative margin:

<p style="margin-left: -115px;">

Might cause problems for viewers on mobile though.

Reply
Darren says:

December 21, 2014 at 8:20 pm

We’ve all heard of the h-index (http://en.wikipedia.org/wiki/H-index) but is there an equivalent or similar index for a the reproducibility of a scientific paper?

Reply
Henry Rzepa says:

December 21, 2014 at 9:28 pm

Darren,

An h-index is really an attempt at measuring impact rather than reproducibility (that impact is assumed positive, but it might also be negative!).

An h-index is taken from citable works, and data is just as much citable as narrative (i.e. the conventional article or paper). So yes, data can potentially be a contributor to an h-index. An interesting question is when citable data will first contribute to an individual’s h-index. No such has yet made it to my own h-index, but I happen to believe that it will only be a matter of time.

A more subtle point is whether a data citation should be given equal weight to a narrative citation. The community will no doubt come up with an answer to that one.

Reply
Henry Rzepa says:

December 22, 2014 at 10:28 am

Louis:

Thanks for the tip about moving the inset left! Much better!

I first starting talking to publishers about this sort of enhancement around 2005. Basically, most of them do not accept author-provided markup (i.e. HTML), because their production system cannot in general cope with markup that might not conform to their production schemas and workflows. And by and large they are not prepared to develop schemas that might accommodate author’s materials.

However, the ACS, in 2005, DID provide a container for such content, which they called a WEO (Web-enhanced-object). This sits outside of the production schema, and hence can be fully controlled by the author. I have published many such WEOs over the last nine years. There are issues with these objects, and their longevity is unknown. Nowadays, I tend to use the WEO most simply to provide links to a repository, and also to use constructs such as that included in this post.

You might imagine that experimenting with the boundaries of what a journal article can be is not for everyone (indeed, very few authors have followed us down this path). I am often rather depressed that the publishers rigidly control the innovation of the journal, and it is what they decided goes that represents this innovation. Few authors are allowed to so participate, we are simply told what we must do to conform to what the publisher wishes, and this can so often stifle innovation.

Reply

Henry Rzepa's Blog