A sea-change in science citation? The Wikipedia Science conference.

The first conference devoted to scientific uses of Wikipedia has just finished; there was lots of fascinating stuff but here I concentrate on one report that I thought was especially interesting. To introduce it, I need first to introduce WikiData. This is part of the WikiMedia ecosystem, and one of the newest. The basic concept is really simple.

  1. It is a repository for data objects; 14,757,419 of them as I write this to be precise. These are called items, and each has an ID, prefaced with the letter Q. An example might be mauveine, which is Q421898.
  2. Any item can have one or more properties which can only be selected from a controlled list, of which there are around 3000. An example here might be an individual’s ORCID identifier, which is P496. ORCID is also an item, Q51044.
  3. As with Wikipedia itself, WikiData also has a set of community rules which contributors have to follow. One of the rules in Wikipedia (the COI or conflict of interest rule) which deprecates an interested party from making a significant contribution to any page. Items in Wikidata have a looser COI code; facts are basically facts rather than opinions, but their provenance still matters of course.

With the basic structure set out, I will now describe what I heard today.

  1. An item can be a citation (to the scientific published literature), of which there are currently around 76 million, although currently nothing like that number are currently in WikiData.
  2. One of the properties of a citation item is its DOI or digital object identifier (P356), which would nowadays be regarded as in effect mandatory. Citations can have other properties, which would be populated from CrossRef or DataCite such as metadata associated with e.g. the DOI itself; the journal, the authors, the date, etc. Citations from DataCite can in fact have far richer metadata than the usual, if you follow this link you can see an example of such data properties.
  3. But here is the new stuff. Citations as items can have more subtle properties. Thus a citation could be invoked with a property: A disagrees with B, where A and B are both items (or perhaps properties).

You can see from this that allowing a citation to have such properties can potentially revolutionise the way a scientific article can be constructed. When a citation is invoked, the context the authors wish that citation to have can be added. Contrast this with the context-free way in which articles currently cite other articles. And as with anything in WikiData, instances can be counted, the context in which instances occur can be identified and statistics accumulated.

The way it might work is not so much that any interested reader (a human) would browse through WikiData. Instead it is something that a machine (software) might invoke. In Wikipedia for example, one can transclude or subsume into the article an item from Wikidata. This could be a citation, which you could transclude with one or more associated properties. In chemistry at the moment, the most prominent objects that are constructed from such Wikidata transclusions are ChemBoxes, or tables of properties of molecules as items (Q52426). This is done dynamically at the time of reading the Wikipedia article and so you can imagine that such transclusions can respond as the values of properties are updated/corrected/extended. Unfortunately I do not (yet) know of a good example of all of this which can be linked to here. If any do come to light, I will try to remember to add them here.

As often happens, the concepts above are not entirely new; many were already present in a variation of the Wiki called the Semantic MediaWiki and experiments in chemistry were tried as early as 2007.‡ But WikiData is far easier to use and in symbiosis with a conventional Wiki it might just start to fly now.

The implications of all of this for the way in which a scientific article might work are deep from many different perspectives. I do wonder whether all this data-rich context in which a scientific article or narrative might be couched will be welcomed by either publishers or indeed authors. Perhaps the emotions that humans have but which machines do not will in fact dominate. But it does appear to have the potential for a sea-change in how scientists exchange information.

The number of itemised molecules recently reached 100 million, and there are a few thousand (>? <?) well defined properties that can be associated with molecules. So the whole of known molecular chemistry is actually not that different in scale from the current Wikidata.

Semantic wiki as a model for an intelligent chemistry journal, Rzepa, Henry S. Abstracts of Papers, 233rd ACS National Meeting, Chicago, IL, United States, March 25-29, 2007, CINF-053. Abstract and talk.

  1. ana lobo says:

    Many thanks. Enjoyed it very much and will be waiting for more…

