Exploiting the power of persistent identifiers (PIDs) for locating all kinds of research object.

The folks at DataCite have announced a new research object discovery service which aims to give users a “comprehensive overview of connections between entities in the research landscape”. The portal https://commons.datacite.org acts as the entry point for three basic types of persistent identifiers (PIDs);

  1. Research works, using the DOI (digital object identifier) as a PID. This includes both research articles and research data as “works” or research objects and can be invoked using the prefix https://commons.datacite.org/doi.org?query= to the search query.
  2. People, using the ORCID as a PID via the prefix https://commons.datacite.org/orcid.org?query=
  3. Organisations, using ROR as a PID using the prefix https://commons.datacite.org/ror.org?query=
  4. If one wants to construct a search which combines any two, or all three of the above categories, then the search prefix is simply https://commons.datacite.org/?query=

To use this very modern type of discovery portal, one currently has to be familiar with how to construct a valid search query to be appended to any of the above prefixes. This is now well documented at https://support.datacite.org/docs/datacite-commons, although it still requires some work and patience to construct a precise search query. This in turn requires knowledge of the so-called “metadata schema“, on which the indexing is based.

This sort of activity is best illustrated using examples. As it happens I have already collected a decent set at https://doi.org/drrm, nicely illustrating that a search query, or a collection of search queries, can themselves be considered as a valid research object! That collection used the prefix https://search.datacite.org/works?query= which might usefully be considered as now obsoleted by https://commons.datacite.org/?query=. You can take any of the original queries and try them out here. I will show just two:

  1. https://commons.datacite.org/?query=titles.title:*amidation* The orignal search gives 170 hits, since it is based largely on DOIs for datasets only. The new version of the search yields 1016 hits, since it includes authors and organisations as well. The results look like this, indicating 846 hits come from the CrossRef registration agency (mostly journals) and the rest from DataCite (mostly data).
  2. https://commons.datacite.org/?query=creators.affiliation.affiliationIdentifier:”https://ror.org/041kmwe10″+AND+amidation restricts the search to a specific institution and illustrates how the prefix selected can control the outcome of the search.

  1. https://commons.datacite.org/?query=media.media_type:chemical/x-mnpub*+AND+(subjects.subjectScheme:inchikey+AND+subjects.subject:*BHYQUOWHUMNGMD-UHFFFAOYSA-N*)+AND+(subjects.subjectScheme:NMR_Nucleus+AND+subjects.subject:11B)+AND+(subjects.subjectScheme:NMR_Solvent+AND+subjects.subject:CDCl3) is at the other end of the spectrum for specificity, constraining the search to some very specific chemical properties, the nature of which should be reasonably obvious from the syntax of the query. This specificity is why it continues to give just one hit.

The evolution of these search facilities gives an interesting pointer to what the future might hold. New registration agencies can be easily added to the above lists for including other kinds of research object. For example, instruments and their properties. One can combine these diverse properties into a single search, thus revealing scientific information or connections that may not be apparent from historical (chemical) abstracting agencies such as e.g. CAS or Reaxys. Importantly, all the metadata on which the indexing is based is fully open and not proprietary and currently at least searches such as the above are free at point of use (unlike the chemical registration agencies noted for which commercial licenses have to be purchased by organisations). The concept of searching for relationships across different types of PID is summarised by the term “PID Graph“. This in turn can reveal other properties of the objects, such as e.g. usage statistics and citations;

It is good to see this evolution of new ways of finding scientific information and I rather think that we have only just began to see the potential of this approach; there is much more to come. Exciting times ahead I fancy!

To be continued. This post has a PID: 10.14469/hpc/7366.

Leave a Reply