Experts fail to reliably detect AI-generated histological data

Jan Hartung, Stefanie Reuter, Vera Anna Kulow, Michael Fähling, Cord Spreckelsen, Ralf Mrowka

Posted on: 17 May 2024

Preprint posted on 25 January 2024

Article now published in Scientific Reports at http://dx.doi.org/10.1038/s41598-024-73913-8

AI or real? Hartung and colleagues show that even experts can no longer distinguish genuine histological images from AI-generated ones.

Selected by Reinier Prosee

Categories: scientific communication and education

Background

There is an increasing awareness of the ‘paper mill’ issue in academic publishing – too often, poor or even completely fabricated research papers make it past peer review and into academic journals. For publishers this should be a wake-up call to ensure a clearly structured, transparent peer review process. However, even if such a system is in place, falsified data and images will become more difficult to spot especially given the rise of artificial intelligence (AI).

As the authors of this preprint point out, AI-based methods to generate images are improving by the minute. What is worrying is that Jan Hartung and colleagues show that histological images created artificially already can’t be distinguished from genuine ones by experts. It just shows how (much more) problematic fake images can become in safeguarding the reproducibility and validity of published scientific work.

Key Findings

To study the ability of people – both experts and non-experts – to distinguish between genuine and artificial histological images, the preprint authors recruited 1021 undergraduates studying at different German universities. They categorised the students who completed their survey (= 816 students) as naive (= 290) and experts (= 526) based on prior experience with histological images and showed them both genuine and artificial histological images of mouse kidney tissue samples. Importantly, two sets of artificial histological images were used: one set of images based on 3 (A3), and one based on 15 (A15) genuine images as training data for the AI image generator. This is a breakdown of the main findings:

Naive participants classified 54.5% of all images correctly, whereas experts classified 69.3% correctly.
When artificial images were separated based on the number of images used to train the AI generator, it turned out that A3 images were more often classified successfully than genuine images. The opposite was true for A15 images. This makes sense: more training data results in images that are closer to the real thing.
A small group of expert participants (1.9%) classified all images correctly. Overall, the experts performed better than naive participants which underscores the value of previous training (in this case in looking at histological images). Nonetheless, even the experts couldn’t reliably spot the AI-generated images among the real ones.
What is interesting to note is that correctly classified images had faster response times compared to incorrectly classified ones. It shows that if you know, you know.

Possible Solutions – an important role for publishers?

So how can we address the challenges posed by AI-generated scientific images and ensure the integrity of scientific publications?

The authors propose a few solutions. They mention, for example, that some technical ‘forensical’ tools exist that consistently outperform humans when applied on the same datasets. These could be implemented as part of larger technical standards – like C2PA (https://c2pa.org). For such standards to work it is crucial that the original data associated with publications is accessible. Journals could play a role here by introducing policies requiring authors to submit all original data prior to publication. Journals can also create and maintain databases of previously published images for cross-reference, aiding in identifying image reuse.

It’s clear therefore that publishers need to step up and face the challenge of AI-generated images head on. There already is some evidence showing that the introduction of screening policies by scientific journals can effectively reduce the number of problematic images in accepted manuscripts. So what are we waiting for…?

Questions for the authors

Do you think there are specific factors within the training of experts that contribute to their ability to distinguish between genuine and artificial images? If so, could these be expanded upon?
Could you expand on your classification method to identify naïve vs expert participants? Did you ever consider adding a group of ‘super’ experts (e.g. pathologists)?
It sounds like there could be an ‘arms race’ between AI-based image manipulation software and forensic tools to spot fraudulence. How can we ensure that forensic tools won’t be outpaced by new image manipulation techniques?
Could you expand on the role that publishers should play in ensuring the reproducibility and validity of published scientific work/images?

Tags: academic publishing, ai, figures, images, integrity

doi: https://doi.org/10.1242/prelights.37447