Research Practices and Tools: Preprints and other tools of open research: contribution to the Open Access roundtable at GFP 2019

In the context of the conference GFP 2019 on polymer chemistry, I am taking part in a roundtable on Open Access. Chemists are coming quite late to the Open Access debate. The preprint archive Chemrxiv is young, not widely used, and not independent from publishers. The traditional subscription-based publishing system, and the standard bibliometric indicators, dominate communication and evaluation. And when chemists are dragged into the debate by discipline-agnostic initiatives such as Plan S, their positions tend to be conservative.

Inevitably, chemists are being affected by Open Access and other evolutions of the research system, whether or not these evolutions seem beneficial to them. It would be useful for chemists to know more about preprints and other tools of scientific communication, beyond the traditional journals: not only to comply with Open Access mandates, but also to make their own choices among the existing innovations and best practices.

Introducing myself

Researcher in theoretical physics, employed by CNRS, working at CEA Saclay.
Contributor to open and collaborative platforms such as GitHub, Wikipedia, StackExchange.
Editorial board of the Wikijournal of Science, a journal that publishes Wikipedia-style articles.
A blog called “Research practices and tools” for discussing open science, and publishing the reviewer reports that I write.

Administrative archives vs disciplinary archives

There are two broad categories of preprint archives, which I will call administrative and disciplinary. (Administrative archives might also be called institutional archives, or local archives, since they cover an institution, country, or region.) Here is a synthetic table of their features, with some more details below:

The largest and oldest disciplinary archive is Arxiv. It was started in 1991 by theoretical physicists as a way to make the distribution of preprints more efficient. (Before that, they were using email. Even before, just mail.) Arxiv is where many physicists find and read their colleagues’ articles, long before they appear in journals. In fields where all papers are on Arxiv, keeping up with the literature requires no more effort than checking the list of new preprints every day.

It was initially thought that Arxiv might entirely replace scientific journals, but this did not happen, even though journals’ role diminished greatly. Arxiv preprints can be submitted to journals, and in many journals the submission procedure involves little more than entering an Arxiv preprint number.

Administrative archives are not built by researchers, but by librarians or administrators. Their purposes are to keep track of researchers’ output, allow researcher to fulfill Open Access mandates, and allow them to disseminate documents that are unsuitable for disciplinary archives. In particular, administrative archives accept submissions from all disciplines. Research articles are typically deposited after they are published in journals: a researcher may deposit an article after the journal’s embargo has expired, or when her employer needs an up-to-date list of her publications. For example, CNRS no longer asks me for my list of publications: CNRS takes it automatically from HAL, and any publication that is missing from HAL is ignored.

Both types of archives allow the sharing of various types of documents beyond research articles. A small percentage of Arxiv preprints are commentaries (sometimes demolitions) of other articles, or replies by the authors of a criticized article. Each archive has its own rules about which types are acceptable. Archives also allow documents to be updated, while keeping track of older versions.

Three disciplinary archives

Disciplinary preprint servers share a number of basic features:

Free to read, free to publish. (Costs are about 10$ per article, covered by institutional subsidies.)
Permanent, irrevocable archiving.
Indexed by Google Scholar and others.
Basic screening, no formal peer review.
Establish priority.

A comparison of three important disciplinary archives:

A few details on this table:

Arxiv covers a number of disciplines, including some biology and some finance, but mainly it is physics, mathematics and computer science.
Chemrxiv is an initiative of the American Chemical Society and its counterparts in other countries. The ACS is also a major publisher, with a very aggressive stance towards unauthorized sharing of scientific articles. Therefore, we may expect Chemrxiv to be less indifferent than Arxiv of Biorxiv to the prosperity of legacy journals. It is unclear whether a researcher-controlled chemistry archive could still emerge.
Biorxiv is the only archive that allow people to write comments about preprints. However the rate of comments is low. Twitter seems to be a more popular venue for publicly discussing preprints.

Life of a research article

So how exactly do we use Arxiv? In disciplines where all papers are on Arxiv, here is what typically happens to an article:

The first Arxiv version may be the version that most readers read, so it had better be good.
Feedback and improvements are not limited to the journal’s peer review process: they can also come before the paper is submitted to a journal, and after it is published. Informal peer review via exchanges with readers takes a renewed importance. For an example in chemistry, see the comments to this blog post by Henry Rzepa, including references to the relevant Chemrxiv preprints.
Post-publication versions are used for correcting mistakes small and large. Errata are rarely sent to the journal. And there is little need to fight reviewers if their ask for changes that I disagree with: I can always have my preferred version on Arxiv.
It is possible to submit an unlimited number of versions to Arxiv. Typical papers only have a few versions. But there are also living reviews that are regularly updated.

Arxiv saturation and endgame

The Arxiv submission statistics show a marked growth in some subfields, and a stationary situation in others.

Most of high-energy physics (blue) and condensed-matter physics (green) have stopped growing years ago, because all articles in these fields are on Arxiv. This has many consequences for the work of researchers. In particular, publication in journals is now optional. A famous example: the mathematician Grigori Perelman was offered the 2006 Fields medal for works that appeared in Arxiv preprints and were never submitted to journals. Of course, these works were thoroughly peer-reviewed, but the peer reviewing was not organized through journals.

This shows that we could easily do research without journals. Nevertheless, journals are alive and well, mainly due to their continued influence on academic careers. Estimating the quality of a work from the journal it appears in remains common practice, although it is widely denounced.

Beyond open access: open peer review

After open access, the next frontier of scientific publishing may be open peer review. Open peer review may mean different things:

publishing reviewer reports, typically for accepted articles, as this is harder for rejected articles,
accepting spontaneous reviews in addition to invited reviews,
publishing reviewer names, unless the reviewers decline to do so.

Two recently created families of OA journals that practice open peer review:

PeerJ Chemistry (started 2018): 5 journals covering all of chemistry.
SciPost Chemistry (coming soon): will be free to read and to publish, funded by subsidies.

Originally, these are successful journals in biology and physics respectively, now expanding to chemistry.

Beyond research articles

Research articles can carry only certain types of information. Scientific communication involves other media, some of which are well-suited to open, collaborative works:

According to this blog post by chemist Henry Rzepa, sharing data effectively may be more important than having openly accessible articles. To be useful, data should be FAIR: Findable, Accessible, Inter-operable and Re-usable. See this other post for an example of publishing data independently of the article.
Code can be written collaboratively at GitHub, GitLab, etc. These collaborative code repositories use the version control system Git, originally created for developing the Linux operating system.
StackExchange is a family of question and answers websites, with a sophisticated voting and reputation system that is very effective at promoting good content and good contributors. (In comparison, the impact factor and the H-index are crude.)
Wikipedia has many readers (600/day for Neoprene) but too few writers among academics. Is it always more important to write a research article for a few colleagues, than a Wikipedia article for thousands of readers?

Research Practices and Tools

Wednesday, 27 November 2019

Preprints and other tools of open research: contribution to the Open Access roundtable at GFP 2019