## Tuesday, 29 May 2018

Out of the 2 million academic articles that are published each year, many are not read by anyone but their authors, and most have no more than a handful of readers. For someone who writes in order to spread ideas, and does not publish just to avoid perishing, this can be quite discouraging. Of course, not everything one has to say is of interest to many people. Still, part of the problem could come from the academic article as a venue, and one may wonder whether other writing venues (such as a blog) could reach more readers.

In this post I will list various venues for scientific writing, and try to do an order-of-magnitude comparison between them. I will not simply estimate how many readers are reached: a tweet may reach many people, but it is read in seconds and can be quickly forgotten. Rather, I will try to estimate the ratio between the time spent by all readers on a text, and the time spent writing it.

Like any quantitative metric of this type, this “time ratio” comes with potential problems with its estimation and interpretation. I will deal with some of these problems in the next paragraph, and argue that the time ratio is meaningful for comparing venues at the order-of-magnitude level. Readers who already accept that can skip this paragraph.

#### The fine print

Here are the admittedly important caveats with the time ratio.
1. The ratio only deals with the writing itself, whereas much work is needed before having something interesting to write. The purpose of the exercise is to compare the efficiency of various kinds of writing.
2. What about oral communication, videos, etc? Would be an interesting generalization.
3. You favour unclear texts that make readers sweat longer. At the level of venues rather than individual texts, the effect of clarity should average out.
4. Reading time is a poor proxy for the influence of ideas. I want to measure the efficiency of writing venues, not the influence of ideas.
6. The time ratio cannot be defined or measured precisely. Good! Then there is less potential for abuse than with the h-index or impact factor.

#### The formulas

We can estimate the time ratio at the level of individual texts, or at the level of individual people. The time ratio for a text is
$$R_\text{text} = N_\text{readers} \times \frac{T_\text{reading}}{T_\text{writing}}$$ The time ratio for a person is the time spent as a reader in a venue, divided by the time spent as a writer:
$$R_\text{person} = \frac{T_\text{reader}}{T_\text{writer}}$$ For a given venue, these two definitions of the time ratio should agree on average. I will use the text-level definition as the primary computing tool, and the person-level definition as a sanity check. I will present the numbers for a given venue as follows:

The length of the white rectangle is proportional to the logarithm of Rtext. If the ratio is less than one, this logarithm is negative, and the blue rectangle for Twriting actually spills to the left of the red rectangle. All these numbers are estimated up to factors of 3. Times are given in hours. So here are the results:

#### Where do these numbers come from?

• Article: A typical scientific article. I estimate that an article is written in Twriting = 100 hours and read in Treading = 1 hour. These numbers can vary a lot between articles, what matters is their ratio. By reader I mean someone who goes further than the abstract. The number of readers per paper Nreaders = 30 can be estimated in two ways: as how many papers a scientist reads for each paper he writes (not forgetting that having coauthors means writing fractions of papers), or as the number of citations per paper. (Papers are often cited without being read, or read without being cited, I assume that these effects cancel out.) The resulting time ratio R = 0.3 suggests that the average scientists spends more (but not much more) time writing than reading articles, which sounds right.
• Report: A confidential reviewer report for a journal. The time Twriting = 3 hours is for the writing per se, not the full time spent on the reviewed article. The reading time Treading = 0.3 is quite short because the report is targeted communication, whose readers are is an ideal position to quickly absorb the information. Therefore, the ratio R = 0.3 may somewhat understimate the efficiency of reports.
• Email: An email to other scientists that asks for technical explanations, or answers such a request. I assume that such emails are typically addressed to a few people (collaborators, authors of an article, etc), hence Nreaders = 3. The result R = 0.3 suggests that we spend more time writing than reading emails, which sounds right. Again, this is targeted communication.
• Wikipedia: A medium-size Wikipedia article on a technical and relatively obscure subject that was not previously well-covered. For example, this article that I wrote. Writing an acceptable Wikipedia article takes quite some time, of the order of Twriting = 30 hours. And it will typically be read quickly, let’s say in Treading = 0.1 hours. But the number of readers is large: pageview statistics suggest a few dozen readers per day, so possibly Nreaders = 10000 readers over a lifetime of one year or more before the text undergoes large changes. There are large uncertainties in all these numbers, but the result R = 30 is consistent with the idea that most scientists spend some time reading Wikipedia, and very little time (if any) writing in it.
• Blog: A blog post on a technical subject, in a personal blog by a typical researcher. In my experience, this is quite comparable to a reviewer report: after reading an interesting article in some detail I can blog about it, write a report if invited to do so, or both. The blog post will have many more readers than the confidential report, my blog’s counter suggests Nreaders = 30 unique readers for such texts. I expect that the average reading time Treading = 0.1 hours is quite short, nevertheless the ratio R = 1 is better than for a report.
• Answer: A substantial question or answer on a StackExchange-like website. Nreaders = 30 is estimated as a reasonable multiple of the typical number 3 − 10 of upvotes. An answer is read quickly, and also written rather quickly, assuming that we choose to answer a question only if we can do so quite easily. So Twriting = 0.3 is better than for an email, because with emails we have less freedom to choose to answer or not. The ratio R = 3 suggests that we spend more time reading answers than writing them.

#### Interpretation: traditional and emerging venues

Articles, reports and emails have comparable time ratios. I would call them traditional venues, because they are routinely used by most working scientists. These venues compete with one another for the attention of scientists as readers and writers, and this competition may have reached an equilibrium state, which may explain why the ratios are comparable.

Wikipedia, blogs and answers could be called emerging venues: writing in these venues is not part of the average scientist’s normal workflow, and does not formally count in her career. Reading Wikipedia and answers is however probably becoming the norm, and this imbalance between writing and reading may help explain the large time ratios of these venues.

Let me try to elaborate by making a pairwise comparison between traditional and emerging venues: