Open Access journal publishing debates - the elephant in the room?

Open Access journal publishing debates – the elephant in the room?

For perhaps ten years now, the future of scientific publishing has been hotly debated. The traditional models are often thought to be badly broken, although convergence to a consensus of what a better model should be is not apparently close. But to my mind, much of this debate seems to miss one important point, how to publish data.

Thus, at one extreme is COAlition S, a model which promotes the key principle that “after 1 January 2020 scientific publications on the results from research funded by public grants provided by national and European research councils and funding bodies, must be published in compliant Open Access Journals or on compliant Open Access Platforms.” This includes ten principles, one of which “The ‘hybrid’ model of publishing is not compliant with the above principles” has revealed some strong dissent, as seen at forbetterscience.com/2018/09/11/response-to-plan-s-from-academic-researchers-unethical-too-risky I should explain that hybrid journals are those where the business model includes both institutional closed-access to the journal via a subscription charge paid by the library, coupled with the option for individual authors to purchase an Open Access release of an article so that it sits outside the subscription. The dissenters argue that non-OA and hybrid journals include many traditional ones, which especially in chemistry are regarded as those with the best impact factors and very much as the journals to publish in to maximise both the readership, hence the impact of the research and thus researcher’s career prospects. Thus many (not all) of the American Chemical Society (ACS) and Royal Society of Chemistry (RSC) journals currently fall into this category, as well as commercial publishers of journals such as Nature, Nature Chemistry,Science, Angew. Chemie, etc.

So the debate is whether funded top ranking research in chemistry should in future always appear in non-hybrid OA journals (where the cost of publication is borne by article processing charges, or APCs) or in traditional subscription journals where the costs are borne by those institutions that can afford the subscription charges, but of course also limit the access. A measure of how important and topical the debate is that there is even now a movie devoted to the topic which makes the point of how profitable commercial scientific publishing now is and hence how much resource is being diverted into these profit margins at the expense of funding basic science.

None of these debates however really takes a close look at the nature of the modern research paper. In chemistry at least, the evolution of such articles in the last 20 years (~ corresponding to the online era) has meant that whilst the size of the average article has remained static at around 10 “pages” (in quotes because of course the “page” is one of those legacy concepts related to print), another much newer component known as “Supporting information” or SI^♥ has ballooned to absurd sizes. It can reach 1000 pages[1] and there are rumours of even larger SIs. The content of SI is of course mostly data. The size is often because the data is present in visual form (think spectra). As visual information, it is not easily “inter-operable” or “accessible”. Nor is it “findable” until commercial abstracting agencies chose to index it. Searches of such indexed data are most certainly “closed” (again depending on institutional purchases of access) and not “open access”. You may recognise these attributes as those of FAIR (Findable, accessible, inter-operable and re-usable). So even if an article in chemistry is published in pure OA form, in order to get FAIR access to the data associated with the article, you will probably have to go to a non-OA resource run by a commercial organisation for profit. Thus a 10 page article might itself be OA, but the full potential of its 1000+ page data (an elephant if ever there was one) ends up being very much not OA.

You might argue that the 1000+ pages of data does not require the services of an abstracting agency to be useful. Surely a human can get all the information they want from inspecting a visual spectrum? Here I raise the future prospects of AI (artificial intelligence). The ~1000 page SI I noted above[1] includes e.g NMR spectra for around 70 compounds (I tried to count them all visually, but could not be certain I found them all). A machine, trained to identify spectra from associated metadata (a feature of FAIR), could extract vastly more information than a human could from FAIR raw data^‡ (a spectrum is already processed data, with implied information/data loss) in a given time. And for many articles, not just one. Thus FAIR data is very much targeted not only at humans but at the AI-trained machines of the future.

So I again repeat my assertion that focussing on whether an article is OA or not and whether publishing in hybrid journals is to be allowed or not by funders is missing that 100-fold bigger elephant in the room. For me, a publishing model that is fit for the future should include as a top priority a declaration of whether the data associated with it is FAIR. Thus in the Plan-S ten principles, FAIR is not mentioned at all. Only when FAIR-enabled data becomes part of the debates can we truly say that the article and its data are on its way to being properly open access.

^‡The FAIR concept did not originally differentiate between processed data (i.e. spectra) and the underlying primary or raw data on which the processed data is based. Our own implementation of FAIR data includes both types of data; raw for machine reprocessing if required, and processed data for human interpretation. Along with a rich set of metadata, itself often created using carefully designed workflows conducted by machines.

^♥The proportion of articles relating to chemistry which do not include some form of SI is probably low. These would include articles which simply provide a new model or interpretation of previously published data, reporting no new data of their own. A famous historical example is Michael Dewar’s re-interpretation of the structure of stipitatic acid[2] which founded the new area of non-benzenoid aromaticity.

Author

Henry Rzepa

Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.

View all posts

References

J.M. Lopchuk, K. Fjelbye, Y. Kawamata, L.R. Malins, C. Pan, R. Gianatassio, J. Wang, L. Prieto, J. Bradow, T.A. Brandt, M.R. Collins, J. Elleraas, J. Ewanicki, W. Farrell, O.O. Fadeyi, G.M. Gallego, J.J. Mousseau, R. Oliver, N.W. Sach, J.K. Smith, J.E. Spangler, H. Zhu, J. Zhu, and P.S. Baran, "Strain-Release Heteroatom Functionalization: Development, Scope, and Stereospecificity", Journal of the American Chemical Society, vol. 139, pp. 3209-3226, 2017. https://doi.org/10.1021/jacs.6b13229
M.J.S. DEWAR, "Structure of Stipitatic Acid", Nature, vol. 155, pp. 50-51, 1945. https://doi.org/10.1038/155050b0

Tags: Academia, Academic publishing, American Chemical Society, Angewandte Chemie, article processing charge, article processing charges, artificial intelligence, Cognition, Company: RSC, Electronic publishing, G factor, Hybrid open access journal, Knowledge, Michael Dewar, Nature, online era, Open access, Predatory publishing, Publishing, researcher, Royal Society of Chemistry, Scholarly communication, Science, Technology/Internet

This entry was posted on Sunday, November 4th, 2018 at 12:22 pm and is filed under Chemical IT. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Open Access journal publishing debates – the elephant in the room?”

Henry Rzepa says:

November 5, 2018 at 11:30 am

My comments above were somewhat focussed on chemistry journals, but it is always a good idea to look at what other disciplines are doing. In Tim Gowers’ blog a mathematician), he mentions some recently launched new journals. One of them, Acta Mathematica is merely newly launched, and is described by him as one of a tiny handful of the very top journals in mathematics. And Last year it became fully open access without charging author fees. So for a really good paper it is a great option.

Tim has played a part in launching new journals himself, and the minimal costs are under-written by sources other than by the authors themselves via APCs. It would also be fair to say that mathematics is rather less dependent on data in the sense that it is used in e.g. chemistry. What this shows is that new models of science publishing can both succeed and maintain high reputation.

Reply
Henry Rzepa says:

November 18, 2018 at 6:24 am

I noted in my blog that in terms of size at least, the standard article in chemistry can be dwarfed by its supporting information (which in conventional form is mostly unstructured and distinctly unFAIR data). Now the focus changes to the (also unFAIR?) bibliography at the end of the article. This too can constitute a significant component of the total (often 100+ citations can be included). The open citations initiative (in my blogroll here) is increasingly succeeding in liberating this part of the article away into open (FAIR) form. Which means the citations too can be removed from any paywall that protects the article itself. Some major chemistry publishers are resisting such decoupling, such as the American Chemical Society (see https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations) but one wonders for how long.

Scholarly scientific communication in the form of a discrete “article” which publishers can protect as part of publishing model is rapidly evolving into new more modular and less easily protectable forms. A very interesting space to watch.

Reply

Henry Rzepa's Blog