Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms

Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Yingfeng Zheng

Posted on: 13 July 2024 , updated on: 16 July 2024

Preprint posted on 30 March 2024

AI: Rise of the Machine(-generated texts)

Selected by Amy Manson, Jennifer Ann Black, Maitri Manjunath

Categories: scientific communication and education

This preLight was created as part of the 2024 SciCommConnect writing sprint session, moderated by Jennifer Ann Black.

Prompt: I need a summary of a paper discussing the use of AI in scientific writing (OpenArt – AI generated image).

Introduction

Artificial intelligence (AI) is not quite ready to take over the world just yet, but it is taking the science communication world by storm. This preprint reveals what the impact of generative AI, such as ChatGPT, has been across the scientific writing community since the rise of machine-generated texts.

While the concept of AI as a serious reality, not just a science fiction dream, has been around since the time of Alan Turing in the 1950s, the technology has recently taken a leap forward in capabilities and public awareness with the introduction of OpenAI’s ChatGPT in 2022. Generative AI is able to produce text, code, images, or other data, using predictions and probabilities, in response to user prompts. Recent improvements in large language models (LLMs), which learn statistical relationships between words from vast libraries of text, have allowed generative AI to develop and improve their language generation capabilities.

Some of the potential benefits of using generative AI in content creation include increased efficiency of both writing and background research; assistance with generation of ideas and hypotheses; fast summarising of data; and an increase in accessibility, particularly for non-native English speakers (1). However, there are also concerns about the impact that generative AI might have on the authenticity, originality, accuracy, and quality of written content (1), which may be of particular concern in academia and academic integrity. Of particular importance is getting the balance right between AI assistance and human contribution in scientific research. Therefore, ethical considerations and transparency in AI usage are highly relevant topics.

As the use of generative AI is a relatively new technique, this is the first study to analyse its impact on scientific writing. The authors asked whether the use of AI to generate text varied across different scientific specialities, regions of the world, types of scientific article, and even within articles, using a dataset of over 45,000 preprint manuscripts submitted over the last two years, and a recently developed AI-detection tool, Binoculars (2).

From Figure 1 of the preprint, made available by a CC-BY 4.0 license. Figure shows the pipeline used by the authors to study AI usage in scientific writing.

Key Findings

1. Binoculars scores detect machine generated texts within scientific manuscripts

Since the release of ChatGPT in 2022, detecting AI-generated texts has been challenging due to the realistic outputs. Here, the authors used a recently developed tool called Binoculars (2) that distinguishes human-generated text from AI-generated text by using a scoring system. A higher Binoculars score indicates that the text is more likely to be generated by humans. The authors could show that the mean and minimum Binoculars values were higher before ChatGPT’s release, while variance increased afterward, suggesting increased use of ChatGPT in manuscript writing post-release. The analysis also revealed that minimum Binoculars scores are strong indicators of AI usage trends, with increased variance driven by these minimum values. The study concludes that the divergence in content quality and authenticity is more apparent after the release of ChatGPT, as indicated by the changing Binoculars scores.

2. AI usage has been found across a diverse range of sciences

The minimum Binoculars scores for domains like biological sciences, computer science and engineering dropped significantly after the release of ChatGPT, indicating an increased use of machine generated content.

3. Usage of AI is linked to author language and demography

The native language of the authors may influence the use of ChatGPT. The study found that most countries showed a decrease in minimum Binoculars values after ChatGPT’s release, particularly in China, Italy, and India, suggesting higher reliance on AI. When this was validated by classifying countries based on language spoken, it was noted that countries with English as an official language had higher mean and minimum Binoculars values. This is also in line with findings that show that LLM detectors more frequently flag texts by non-native English speakers as AI-generated (3).

4. AI usage appears to be context dependent

Stability of content types varied with high and low Binoculars scores. This study found low Binoculars scores for literature reviews and high scores for data presentation and phenomenon descriptions. On examining the score differences for each content type before and after ChatGPT’s release, the authors found, while most content types showed a decrease in scores, literature reviews remained stable. Notably, hypothesis formulation, conclusion summarization, phenomenon descriptions, and future work suggestions experienced the largest drops in scores.

5. AI generated content is more highly cited than non-AI content

The authors next investigated the effect of AI on content quality by analysing citation numbers as a proxy for a paper’s impact. Prior to the introduction of ChatGPT, the authors found no significant correlation between Binoculars score mean values and citation numbers. However, after, a significant negative correlation emerged, suggesting increased use of AI correlates with higher citations. To ensure this wasn’t due to the natural accumulation of citations over time, a fine-grained, 30-day interval analysis was conducted which confirmed the trend.

Paper Summary made using ChatGPT

This study examines the increasing prevalence of AI in scientific writing by analyzing 45,000 manuscripts across three preprint platforms over the past two years, particularly focusing on trends post the release of ChatGPT in late 2022. Utilizing Binoculars scores to detect AI-generated text, findings indicate a significant decline in average scores after November 30, 2022, aligning with heightened interest in “ChatGPT” on Google Trends, signaling widespread adoption of AI in scientific manuscripts. Disciplinary and geographical analyses reveal disparities, with higher AI usage observed in fields like computer science and engineering, and in non-English-speaking countries, supported by Ordinary Least Squares regression. The impact of AI varies by content type; manuscripts presenting novel findings show greater declines in Binoculars scores compared to literature reviews. A noteworthy shift in correlations between Binoculars scores and citation numbers post-ChatGPT suggests AI-generated content is increasingly cited, contrasting with pre-ChatGPT trends.

Despite methodological limitations regarding AI detection and platform coverage, this study represents a pioneering quantitative analysis of AI’s influence on contemporary scientific writing. It underscores the need for nuanced regulatory frameworks and ethical considerations amidst concerns regarding AI’s role in scholarly communication beyond mere plagiarism detection. The study advocates for comprehensive discussions to guide responsible AI integration, highlighting its potential to enhance global scientific discourse despite challenges in implementation and interpretation.

Why we liked this preprint

Amy – The use of AI is a pretty hot topic right now. I thought this paper was clever to use preprints as the basis for testing changes in the use of generative AI in recent years, as it gives a view of the science communication landscape in real-time without any concerns regarding delays and lag involved in the (often lengthy) peer-review and publishing process.

Maitri – I was intrigued by this paper because this is amongst the first papers to systematically analyse the impact of AI in science writing. AI can potentially improve the efficiency of content creation by assisting with writing, summarizing data, and generating ideas. This can be particularly beneficial in scientific research where the volume of literature and data can be overwhelming.

Jenn – AI usage and the impact of AI on science is unclear and likely will be for a while. I liked that the paper begins to address some of the potential impacts of AI on science, specifically focusing on scientific writing. Hopefully more studies like this will help us as scientists decide on how we can use AI positively in our work. AI is a powerful tool and using it correctly will hopefully enhance science rather than detract from it.

References

Chan, C.K.Y., Hu, W. Students’ voices on generative AI: perceptions, benefits, and challenges in higher education. Int J Educ Technol High Educ 20, 43 (2023). https://doi.org/10.1186/s41239-023-00411-8
Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., Goldstein, T. Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. arXiv preprint arXiv:2401.12070. 2024 Jan 22.
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., Zou, J. GPT detectors are biased against non-native English writers. Patterns 4, 7 (2023). https://doi.org/10.1016/j.patter.2023.100779

Questions for the authors

Q1: What are your opinions on the use of AI-generated texts in scientific writing? Would you consider using it yourselves (or have you already)?

Q2: What are the challenges in using the Binocular scoring method when identifying human v/s AI generated hypotheses in research writing?

Q3: How do you resolve issues with non-native speakers being identified as LLM users?

Q4: Do you agree with the use of AI interpretation on the results of a research paper? Do you think AI should guide our scientific hypotheses?

Tags: science communication

doi: https://doi.org/10.1242/prelights.37898

Read preprint

(1 votes)

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the scientific communication and education category:

Sci-comm “behind the scenes”: Gendered narratives of scientific outreach activities in the life sciences

Perry G. Beasley-Hall, Pam Papadelos, Anne Hewitt, et al.

Selected by 10 July 2024

Martin Estermann et al.

Discussion

An analysis of the effects of sharing research data, code, and preprints on citations

Giovanni Colavizza, Lauren Cadwallader, Marcel LaFlamme, et al.

Selected by 05 July 2024

Reinier Prosee et al.

Discussion

Experts fail to reliably detect AI-generated histological data

Jan Hartung, Stefanie Reuter, Vera Anna Kulow, et al.

Selected by 17 May 2024

Reinier Prosee

Discussion

preLists in the scientific communication and education category:

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

List by

Madhuja Samaddar et al.

Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the scientific communication and education category:

Sci-comm “behind the scenes”: Gendered narratives of scientific outreach activities in the life sciences

An analysis of the effects of sharing research data, code, and preprints on citations

Experts fail to reliably detect AI-generated histological data

preLists in the scientific communication and education category:

ASCB EMBO Annual Meeting 2019