Tuesday, 20 March 2018

The open secrets of life with arXiv

If you only think of arXiv as a tool for making articles openly accessible, consider this: in 2007, a study showed that papers appearing near the top of the daily listing of new papers on arXiv, will eventually be more cited than papers further down the list – about two times more cited. And there is a daily scramble for submitting papers as soon as possible after the 14:00 EDT deadline, in order to appear as high as possible on the listing. The effect is not as perverse as it seems, as there is no strong causal relation between appearing near the top and getting more citations. (More likely, better papers are higher in the listing because their authors want to advertise them.)

The consequences of arXiv’s systematic use in some communities are actually so deep that a speciation event has occurred among researchers, and a new species of arXivers has appeared. Here I will try to explain how arXivers live, in order to help non-arXivers understand arXivers, and have an idea of what could happen to them if the currently proliferating clones of arXiv gained widespread use.

Let me define an arXiver as a researcher who puts all her articles on arXiv, and whose colleagues do the same. It may seem that there are no arXivers, since in any given discipline (even physics) only a minority of articles are on arXiv. However, speciation did not happen discipline by discipline: rather, there are a number of subfields where close to 100% of articles are on arXiv, and whose researchers are therefore arXivers. Historically, the first such subfield was theoretical high-energy physics.

An arXiver’s working day almost invariably starts with checking the new papers on arXiv. Since all relevant new papers are there, this has become the sole and only method for arXivers to keep up to date with the literature. When they write papers, arXivers send them to arXiv first, and later (if at all) to a peer-reviewed journal. The consequences are profound and manifold:
  1. A paper on arXiv counts as a claim of priority: once your results are on arXiv, you can no longer get scooped.
  2. A paper that is not on arXiv will never get read or cited. So there is a tipping point in the adoption of arXiv in a research community, after which using arXiv becomes mandatory – unless you do not want to be read. (An arXiver may write a junk paper for inflating his publication list, and publish it in an obscure journal.)
  3. A paper that catches no attention the day it appears on arXiv, may never get read. Catching attention does not necessarily mean being read immediately: it may mean being printed or saved or otherwise marked for future reading. And not all readers will be caught when the paper appears: some readers may attract further readers by citing or recommending the paper. But to catch some attention when appearing is vital, unless one counts on second-order effects such as advertising the paper via talks, or having one or two journal referees read it.
  4. These were pretty straightforward deductions, right? Now we reach a first nontrivial consequence: submitting to arXiv is more demanding than submitting to a journal, and texts submitted by arXivers to arXiv are of higher quality than texts submitted by non-arXivers to journals. An arXiver indeed stakes her paper’s future, and part of her own reputation, on the first arXiv version. (Of course, there are always cases where people initially submit rough drafts, whether to arXiv or to journals.)
  5. ArXivers do not follow what journals publish. However, they may end up reading journal versions of papers if those versions are put on arXiv, or via citation lists that link to journals rather than to arXiv.
  6. ArXivers do not always stop improving a paper after it is published in a journal. Moreover, they may disagree with some of the changes that are requested by the journal, and keep their preferred version on arXiv. So the arXiv version can be better than the journal version. This happens in particular when the journal in question has pointless formatting constraints and length limitations.
  7. ArXivers do publish in journals, but mostly for the sake of their careers, rather than for the peer review process. They are forced to do it by established bureaucratic and bibliometric procedures, even though the resulting improvements will often benefit few readers. Some established researchers can afford to publish some of their papers on arXiv only, but having junior coauthors still forces them to send some other papers to journals. Nevertheless, it is now possible to earn a Fields medal based on arXiv papers.
  8. The quality of peer review in journals that are frequented by arXivers has declined and is often poor. OK, there are no data that I know of for backing this claim, and even without arXiv, the proliferation of papers makes peer review decline. Still, this is an inevitable consequence of points 4, 5 and 7. Official peer review by journals is increasingly irrelevant to arXivers, but peer review is still practiced unofficially in the forms of readers’ feedback, or mentions in later papers. And some papers are occasionnally debated on arXiv.
  9. The disappearance of journals would hardly affect arXivers’ research work. A fortiori, arXivers do not need access to journals. (SCOAP3 was a waste of public money even before Sci-Hub made journal subscriptions universally irrelevant.) Journals are important to arXivers only insofar as they are administratively mandated for purposes such as managing careers.
The last point contradicts the dogma that organized peer review is necessary for scientific research. In effect, arXivers have been running a vast experiment for decades, which consisted in marginalizing the substance of the journals’ work. This may not have been widely noticed, because their journals have continued running, even though they have been taken less and less seriously by readers, authors and reviewers. Of course, the dogma of the necessity of peer review is itself relatively recent, and did not hold sway a few decades ago.

To conclude, although arXiv has very basic functionalities that did not change much since 1991, its actual role and influence much exceed providing easy access to papers. ArXiv plays such a fundamental role in arXivers' scientific lives because all papers that arXivers read appear there first. There should be a name for this type of open access, which allies no costs to authors and readers, complete coverage of new articles, and priority over other ways of distributing articles.

A word of caution: Many researchers are transitional forms between the arXivers that I described, and the traditional scientists for whom journals are essential. For example, mathematicians are heavy users of arXiv, but journals still play an important role for them, and their peer review process can be extremely rigorous and valuable.