Citation needed? Wikipedia and the COVID-19 pandemic

Omer Benjakob, Rona Aviram, Jonathan Sobel

Preprint posted on 17 May 2021

Article now published in GigaScience at

Building bridges: Wikipedia citations, the pandemic and public engagement.

Selected by Emma Wilson, Jonny Coates


Wikipedia was launched in 2001 and today has a monthly readership of approximately 495 million people [1]. With over 155,000 medical-related articles viewed more than 4.88 billion times in 2013 alone, Wikipedia is one of the most viewed medical resources on the globe [2]. To maintain high-quality material, Wikipedia has strict editorial guidelines [4], and medical professionals make up 50% of those editing medical-related articles [3].

In 2020, the world became gripped by a global pandemic caused by the SARS-COV-2 virus. In response to the pandemic, science experienced a cultural shift in how articles were shared and disseminated [5]. There was also an “infodemic” of disinformation [6], with specific hijacking of the scientific literature by many different groups including conspiracy groups and right-wing politicians [5]. However, to date nobody has investigated how this phenomenon has impacted Wikipedia articles on the pandemic.

To look into this, the authors of this preprint investigated the role of popular media and academic sources used as citations on Wikipedia articles related to the COVID-19 pandemic.

Key findings

Wikipedia sources are highly selective

The authors examined 1695 COVID-19 related articles and found that citations in Wikipedia came from a variety of sources, including scientific articles and popular journalism(?). The majority of the scientific articles that were cited were published in Nature, Science, The Lancet and The New England Journal of Medicine (Figure 1 in the preprint). Surprisingly, only 0.42% of all academic papers on COVID-19 were cited on Wikipedia. In addition, papers cited also tended to have a higher altimetric score, meaning that they were generally more widely-shared articles. Encouragingly, almost 1/3rd of papers cited on Wikipedia were open-access, although few were preprints. The reasoning for citing traditionally highly-regarded journals over preprints is due to the underlying editorial requirements for health articles on Wikipedia.

Over 80% of references used in COVID-19 articles were not academic, and instead came from news media or websites. The highly selective nature of citations was also observed with non-academic sources, with  more respected news organisations including the BBC and Reuters, being cited more often. Moreover, the World Health Organisation represented a significant amount of cited content.

Technical articles had a higher “scientific score”

To investigate the role of scientific articles compared to popular media, the authors created a scientific score by calculating the ratio of  academic to non-academic references for each Wikipedia article. Those Wikipedia articles that had high scientific scores (closest to 1) were mostly highly scientific topics such as “cytokine”, “Macrophage-1 antigen” and “Tetrandrine”. In contrast, those with the lowest scientific score (closest to 0) were mostly those articles focussed on social aspects such as “COVID-19 pandemic in North America”, “Boris Johnson” and “Impact of the COVID-19 pandemic on the arts and cultural heritage”.

The authors next examined how the coverage of COVID-19 developed over time. They looked at 231 articles and mapped them to their respective dates of creation starting in 2001, when Wikipedia was created, to May 2020. From the beginning of the pandemic the total number of Wikipedia articles referencing COVID-19 doubled, with those created during 2020 having a lower scientific score (0.14) compared with those created pre-2020 (i.e. those articles on general coronaviruses or behaviour that were applicable to COVID-19) (0.48). The authors reasoned that staying up to date with current COVID-19 came at a cost to their scientific score.

Networks of COVID-19 Knowledge

Next, the authors investigated how Wikipedia articles connected together based on their shared academic sources. They found that six prominent topics emerged which shared multiple citations with other Wikipedia pages. These six topics were termed nodes and included ‘Coronavirus’, ‘Coronavirus disease 2019’,’ COVID-19 drug development’ and ‘COVID-19 pandemic’. Two of these nodes were locked for editing by the public to try and prevent the spread of misinformation.

Why we chose this preprint

This preprint covers an interesting topic and may be able to help us generate a tool to bring complicated scientific topics to a general audience. It may also be a starting point for how social media outlets, such as Facebook and twitter, can help to prevent misinformation. The COVID-19 pandemic has highlighted that public engagement and outreach are essential to help prevent the spread of misinformation and Wikipedia could be a tool to do this.

Tags: citations, corona virus, covid-19, meta science, pandemic, wikipedia

Posted on: 5 July 2021


Read preprint (1 votes)

Authors response

Omer Benjakob, Rona Aviram and Jonathan Sobel shared

This preprint was covered in the second episode of the “Preprints in Motion” podcast where all 3 authors joined us for a detailed discussion. You can listen on Apple (, Spotify ( and Google (

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here