I have had some interesting discussions recently regarding metadata. What emerges is that it can be quite a broadly defined concept and it is clear that a variety of answers might be obtained when asking the simple question “what is it useful for?” Here I set out some of my answers to that question.
Some context needs to be applied before answering such questions (context is perhaps a synonym for metadata!)
These are broad-grained provenance if you like.
We are now moving into fine-grained metadata, and perhaps even crossing the boundary into data itself, since the parameters for either software or instruments can be large and complex and are often so heavily mixed into the data itself that their extrication may be a challenge.
Before introducing examples based on metadata with the focus on discoverability, I want to distinguish between locally packaged metadata and separated metadata (Qu. 2 above). The examples below relate purely to the latter, which has been created as a separate entity by registration with an agency such as DataCite. Such registration also addresses Qu. 3 above about trust. This external agency adds trust by recording the identity of the person (or a process or workflow initiated by a person) registering the metadata together with the registration date (the Datestamp) and also monitors any changes to the metadata (which is allowed) by keeping its version history. Interestingly, there seems to be no mechanism to record any processes or workflows used to create metadata so as to learn how the metadata itself was assembled. Nor have I seen much discussion of this aspect; one for the future I fancy.
I now introduce some examples of discoverability. The descriptions are quite short and are meant to be used in conjunction with a “reverse-engineering” of the (somewhat) human readable search query. These queries are also deposited as “data”, at DOI: 10.14469/hpc/5920
The examples above reveal a somewhat a not entirely human-friendly syntax; with each of them some effort at “de-bugging” was needed to make them work. I gather from the PIDForum that a more friendly GUI to achieve this is on their radar. As I develop or discover more examples of such searches I will add them to the list above at DOI: 10.14469/hpc/5920. Meanwhile, if you want to use any of the above as a template for your own searches do please explore.
In an earlier post, I discussed a phenomenon known as the "anomeric effect" exhibited by…
In the mid to late 1990s as the Web developed, it was becoming more obvious…
I have written a few times about the so-called "anomeric effect", which relates to stereoelectronic…
The recent release of the DataCite Data Citation corpus, which has the stated aim of…
Following on from my template exploration of the Wilkinson hydrogenation catalyst, I now repeat this…
In the late 1980s, as I recollected here the equipment needed for real time molecular…
View Comments
Derek Lowe's blog yesterday:
https://blogs.sciencemag.org/pipeline/archives/2019/07/15/machine-mining-the-literature
This highlights the potential of machine processing properly curated information in natural language (journal abstracts in this case) to provide useful inputs to research. If metadata could routinely stitch the two together, computers would suddenly become much more useful.
ContentMine has been doing this for a little while. Natural language (in chemistry) is at best around 95% accurate, and a fair bit more has to be done to render the results more reliable.
I agree good metadata combined with natural (trained) language searching has lots of potential. Interestingly, whereas the introduction of Google has revolutionised how humans search for information, new generations of search engine such as Elasticsearch are leading the way for embedding into AI-engines. I note that the metadata for FAIR data is indexed by DataCite using ElasticSearch. So we may well expect some revolutionary stuff based on natural language in combination with Elastic metadata to emerge in the next few years.