Close

Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 367,988 people using natural language processing

Matthew H Iveson, Mome Mukerjee, Emma M Davidson, Huayu Zhang, Laura Sherlock, Emily L Ball, Grant Mair, Alice Hosking, Heather Whalley, Michael T C Poon, Joanna M Wardlaw, David Kent, Richard Tobin, Claire Grover, Beatrice Alex, William N Whiteley

Posted on: 4 June 2026

Preprint posted on 27 February 2026

AI filtering of brain imaging reports uncovers covert cardiovascular diseases in Scottish NHS cohort

Selected by Rafidah Mumtahinah Chowdhury, Maya Belleville, Marianne Rustom, uMontreal Neuro preLighters

Categories: neuroscience, pathology

Background

As brain imaging has become more widely used, vascular abnormalities are increasingly detected in patients without clear neurological symptoms. Among these, covert cerebrovascular disease (CCD), including white matter changes, lacunes, and silent infarcts, has emerged as a potential marker of underlying small-vessel pathology. A central question in the medical field is whether these findings represent clinically meaningful warning signs of future neurological disease, or whether they simply reflect the cumulative effects of aging and comorbidity.

To date, much of the evidence linking CCD to outcomes such as stroke and dementia has come from research-based neuroimaging cohorts using standardized MRI protocols. While informative, these studies are conducted in selected populations and may not reflect the realities of routine clinical care, where imaging is performed for heterogeneous indications and is often CT-based. In addition, incidental cerebrovascular findings are typically documented in free-text radiology reports rather than captured as structured data, limiting their visibility in large-scale epidemiological studies. This creates an important gap between what is observed in controlled research settings and what is encountered in everyday clinical practice.

In this context, the authors of this preprint aim to determine whether CCD identified through routine radiology reports carries independent prognostic significance for future neurological disease. Using natural language processing applied to a large population-based imaging cohort, this study tests the hypothesis that clinically reported vascular brain lesions are not merely incidental findings but instead mark an increased risk of subsequent stroke and dementia. At the same time, it raises the broader question of how such findings should be interpreted and acted upon in clinical practice. Alongside this, the use of an LLM for such a large cohort allows to grasp the potential of this technology in other biological domains with large clinical or animal cohorts, particularly in the case of multi-center studies or large sample sizes, in order to effectively analyze qualitative data for larger statistical power.

Key Findings

This study analyzed MRI and CT records from over 360,000 patients within the Scottish National Health Service. This clinical population was evaluated using a Large Language Model to efficiently identify terms (“ischemic”, “stroke”, “lacune”, etc.) relevant to the four CCD phenotypes. Patient files were evaluated at 1-year and 5-year time points after imaging reports, in order to have sufficient time to have a diagnosis.

People presenting CCD or atrophy phenotypes had higher chances of experiencing stroke, dementia, and Parkinson’s disease.

Stroke risk was significantly higher in cortical infarct (11%), lacune (9%) and cerebral atrophy (6%) phenotypes. In addition, dementia risk was significantly higher in WMH (23%), cortical infarct (22%) and lacune (22%) phenotypes (preprint Table 2).

All CCD phenotypes show higher hazard ratios for stroke.

Adjusted hazard ratios (aHR) and 95% confidence intervals (CI) after a 12-year follow-up period indicated that each phenotype (vs. no phenotype) had a higher risk of any kind of stroke (preprint Figure 3). In particular, cortical infarct phenotypes were most associated with ischaemic (aHR: 1.9) as well as haemorrhagic stroke (aHR 1.7), lacune phenotype with haemorrhagic stroke (aHR: 1.6), and cerebral atrophy with unspecified stroke (aHR: 1.2) (preprint Figure 4). Combinations of two or more phenotypes correlated with a higher aHR in all strokes, with higher risks associated with all 4 phenotypes combined.

Cerebral atrophy, white matter hyperintensities and cortical infarct phenotypes show higher hazard ratios for dementia.

Chance of dementia was significantly higher in cerebral atrophy (aHR: 1.7), WMH (aHR: 1.3) and cortical infarct (aHR: 1.1) phenotypes (preprint Figure 3). In particular, cerebral atrophy was associated with higher risk of all dementia types: Alzheimer’s disease (aHR: 1.9), vascular dementia (aHR: 1.4) and unspecified dementia (aHR: 1.5). Cortical infarct or lacune phenotypes were associated with higher risk of vascular dementia (cortical infarct – aHR: 1.8; lacune – aHR: 1.6), but a lower chance of Alzheimer’s disease (cortical infarct –  aHR: 0.7; lacune – aHR: 0.7). WMH phenotype was associated with vascular (aHR: 2.0) and unspecified (aHR: 1.3) dementias (Figure 4). Similar to strokes, combinations of two or more phenotypes correlated with higher aHR in all dementias and showed highest risks regarding all four phenotypes combined.

Cerebral atrophy, white matter hyperintensities and cortical infarct phenotypes change in hazard ratios for Parkinson’s disease.

In the case of Parkinson’s disease, aHRs were lower in relation to cortical infarct phenotype (aHR: 0.7) and higher in relation to cerebral atrophy (aHR: 1.4) and WMH (aHR: 1.1) phenotypes (Figure 3). The cumulative effect of several phenotypes was not observed for a diagnosis of Parkinson’s disease.

No significant effects were found for epilepsy or colorectal cancer diagnoses.

Why we highlight this preprint

We choose to highlight this observational retrospective longitudinal cohort study for three main reasons:

  • While much cerebrovascular research has traditionally focused on overt clinical events such as stroke, increasing attention is now being directed toward covert cerebrovascular disease (CCD), including white matter changes and silent infarcts. This study contributes to this shift by evaluating the long-term clinical relevance of these often-incidental findings in a large, real-world population sample.
  • We were particularly struck by the use of natural language processing (NLP), a fairly novel approach, to extract imaging phenotypes from unstructured radiology reports at scale. This approach enables the study of clinically relevant but otherwise uncoded features, opening new possibilities for leveraging routine healthcare data in epidemiological research.
  • The scale of this study, including over 360,000 individuals from routine clinical care, provides a level of statistical power and ecological validity that is rarely achieved in neuroimaging research, which is often limited to smaller, highly selected cohorts.

Importantly, the study addresses a common clinical dilemma that is personally important to us: how to interpret incidental findings, such as white matter hyperintensities or cerebral atrophy, on imaging. By linking these findings to future neurological outcomes, the work begins to provide valuable context for clinicians encountering these reports in practice. In particular, incidental findings have been valuable throughout the history of science and allowed for incredible discoveries, including penicillin, but can be notoriously difficult to statistically define in the case of large sample-size studies, as in the case of one of authors, currently doing research in epidemiological neuroscience with large-scale metrics. This method of analyzing incidental findings and comptabilizing them into quantitative, interpretable data is worth adding to the arsenal of tools across all sciences.

Questions for the authors

  1. Ischemic events can correlate with or even be causal to epilepsy. We were wondering if other brain-based controls that are less related to stroke could be used for the same analyses, such as primary brain tumors, in order to isolate as much as possible stroke-related vs. other brain events?
  2. While the study adjusts for several key variables, important vascular and lifestyle factors such as smoking, obesity, and blood pressure were not available. Given their strong association with both CCD and outcomes like stroke and dementia, how might this residual confounding influence your hazard ratio estimates? Do you think incorporating these variables in future work could meaningfully change the observed associations, particularly in distinguishing whether CCD is an independent risk factor or a proxy for broader vascular risk?
  3. The study uses Cox proportional hazards models, which assume constant effects over time. Although this assumption was tested using Schoenfeld residuals, the diagnostics are not shown, making it difficult to assess its validity for CCD variables. Notably, sensitivity analyses suggest that some associations attenuate or reverse after excluding early follow-up, indicating potential time-dependent effects. Could you comment on whether CCD effects varied over time, and whether alternative approaches (ex. time-varying coefficients or piecewise models) were considered to better capture this?

Tags: ai, brain imaging, cohort study, covert cerebrovascular disease, dementia, electronic health records, natural language processing, neurological disease risk, stroke

Read preprint (No Ratings Yet)

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

Also in the neuroscience category:

EBV reprograms autoreactive anti-CNS B cells as antigen presenting cells in multiple sclerosis

Shady Younis, Sajede Rasouli, Jacob W. Loeffler, et al.

Selected by 04 June 2026

Léa Bastien et al.

Immunology

The Endocannabinoid System’s Contribution to Placebo Analgesia

Rossi Tomin, Kevin Murray, Georgia E. Hadjis, et al.

Selected by 04 June 2026

Thomas Nicodemo Arrieta et al.

Neuroscience

Generalization and extinction of learned fear alter primary sensory input to the brain

Michelle C. Rosenthal, Alper K. Bakir, John P. McGann

Selected by 21 May 2026

Kathleen Ngo et al.

Neuroscience

Also in the pathology category:

EBV reprograms autoreactive anti-CNS B cells as antigen presenting cells in multiple sclerosis

Shady Younis, Sajede Rasouli, Jacob W. Loeffler, et al.

Selected by 04 June 2026

Léa Bastien et al.

Immunology

Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 367,988 people using natural language processing

Matthew H Iveson, Mome Mukerjee, Emma M Davidson, et al.

Selected by 04 June 2026

Rafidah Mumtahinah Chowdhury et al.

Neuroscience

Snake venom metalloproteinases are predominantly responsible for the cytotoxic effects of certain African viper venoms

Keirah E. Bartlett, Adam Westhorpe, Mark C. Wilkinson, et al.

Selected by 13 January 2026

Daniel Osorno Valencia

Pharmacology and Toxicology

preLists in the neuroscience category:

preLighters’ choice – Handpicked DevBio preprints

preLighters with expertise across developmental and stem cell biology have nominated a few developmental biology (and related) preprints they’re excited about and explain in a few paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.

 



List by Theodora Stougiannou et al.

BSDB Spring Meeting: Molecules to Morphogenesis

The British Society for Developmental Biology (BSDB) Spring Meeting Molecules to Morphogenesis was held from 23–26 March 2026 at the University of Warwick (UK). This meeting brought together a vibrant community of researchers to discuss how molecular mechanisms are integrated across scales to drive morphogenesis, spanning diverse model systems and approaches. This preList contains preprints by presenters from the talk and poster sessions at the meeting. Please do get in touch at preLights@biologists.com if you notice any relevant preprints that we may have missed.

 



List by Ingrid Tsang

Keystone Symposium on Stem Cell Models in Embryology 2026

The Keystone Symposium on Stem Cell Models in Embryology, 2026, was organised by Jun Wu (UT Southwestern), Jianping Fu (University of Michigan) and Miki Ebisuya (TU Dresden) and held at Asilomar Conference Grounds in California (US). The meeting discussed recent advances made in establishing stem-cell-based embryo models, what fundamental insights into developmental processes have been gleaned from them, as well as how they are beginning to be applied more widely. This prelist contains preprints by presenters at the talk and poster sessions at the conference, which our Reviews Editor in attendance spotted. Please do reach out to preLights@biologists.com if you notice any that we’ve missed.

 



List by Ingrid Tsang

November in preprints – DevBio & Stem cell biology

preLighters with expertise across developmental and stem cell biology have nominated a few developmental and stem cell biology (and related) preprints posted in November they’re excited about and explain in a single paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.

 



List by Aline Grata et al.

October in preprints – DevBio & Stem cell biology

Each month, preLighters with expertise across developmental and stem cell biology nominate a few recent developmental and stem cell biology (and related) preprints they’re excited about and explain in a single paragraph why. Short, snappy picks from working scientists — a quick way to spot fresh ideas, bold methods and papers worth reading in full. These preprints can all be found in the October preprint list published on the Node.

 



List by Deevitha Balasubramanian et al.

October in preprints – Cell biology edition

Different preLighters, with expertise across cell biology, have worked together to create this preprint reading list for researchers with an interest in cell biology. This month, most picks fall under (1) Cell organelles and organisation, followed by (2) Mechanosignaling and mechanotransduction, (3) Cell cycle and division and (4) Cell migration

 



List by Matthew Davies et al.

July in preprints – the CellBio edition

A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: (1) Cell Signalling and Mechanosensing (2) Cell Cycle and Division (3) Cell Migration and Cytoskeleton (4) Cancer Biology (5) Cell Organelles and Organisation

 



List by Girish Kale et al.

May in preprints – the CellBio edition

A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: 1) Biochemistry/metabolism 2) Cancer cell Biology 3) Cell adhesion, migration and cytoskeleton 4) Cell organelles and organisation 5) Cell signalling and 6) Genetics

 



List by Barbora Knotkova et al.

April in preprints – the CellBio edition

A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: 1) biochemistry/metabolism 2) cell cycle and division 3) cell organelles and organisation 4) cell signalling and mechanosensing 5) (epi)genetics

 



List by Vibha SINGH et al.

Biologists @ 100 conference preList

This preList aims to capture all preprints being discussed at the Biologists @100 conference in Liverpool, UK, either as part of the poster sessions or the (flash/short/full-length) talks.

 



List by Reinier Prosee, Jonathan Townson

2024 Hypothalamus GRC

This 2024 Hypothalamus GRC (Gordon Research Conference) preList offers an overview of cutting-edge research focused on the hypothalamus, a critical brain region involved in regulating homeostasis, behavior, and neuroendocrine functions. The studies included cover a range of topics, including neural circuits, molecular mechanisms, and the role of the hypothalamus in health and disease. This collection highlights some of the latest advances in understanding hypothalamic function, with potential implications for treating disorders such as obesity, stress, and metabolic diseases.

 



List by Nathalie Krauth

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

 



List by Alex Eve, Katherine Brown

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23

 



List by Alex Eve

Journal of Cell Science meeting ‘Imaging Cell Dynamics’

This preList highlights the preprints discussed at the JCS meeting 'Imaging Cell Dynamics'. The meeting was held from 14 - 17 May 2023 in Lisbon, Portugal and was organised by Erika Holzbaur, Jennifer Lippincott-Schwartz, Rob Parton and Michael Way.

 



List by Helen Zenner

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020

 



List by Ana Dorrego-Rivas

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

 



List by Madhuja Samaddar et al.

SDB 78th Annual Meeting 2019

A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.

 



List by Alex Eve

Autophagy

Preprints on autophagy and lysosomal degradation and its role in neurodegeneration and disease. Includes molecular mechanisms, upstream signalling and regulation as well as studies on pharmaceutical interventions to upregulate the process.

 



List by Sandra Malmgren Hill

Young Embryologist Network Conference 2019

Preprints presented at the Young Embryologist Network 2019 conference, 13 May, The Francis Crick Institute, London

 



List by Alex Eve

Also in the pathology category:

preLighters’ choice – Handpicked DevBio preprints

preLighters with expertise across developmental and stem cell biology have nominated a few developmental biology (and related) preprints they’re excited about and explain in a few paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.

 



List by Theodora Stougiannou et al.

October in preprints – DevBio & Stem cell biology

Each month, preLighters with expertise across developmental and stem cell biology nominate a few recent developmental and stem cell biology (and related) preprints they’re excited about and explain in a single paragraph why. Short, snappy picks from working scientists — a quick way to spot fresh ideas, bold methods and papers worth reading in full. These preprints can all be found in the October preprint list published on the Node.

 



List by Deevitha Balasubramanian et al.

October in preprints – Cell biology edition

Different preLighters, with expertise across cell biology, have worked together to create this preprint reading list for researchers with an interest in cell biology. This month, most picks fall under (1) Cell organelles and organisation, followed by (2) Mechanosignaling and mechanotransduction, (3) Cell cycle and division and (4) Cell migration

 



List by Matthew Davies et al.

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome

 



List by Hiral Shah

COVID-19 / SARS-CoV-2 preprints

List of important preprints dealing with the ongoing coronavirus outbreak. See http://covidpreprints.com for additional resources and timeline, and https://connect.biorxiv.org/relate/content/181 for full list of bioRxiv and medRxiv preprints on this topic

 



List by Dey Lab, Zhang-He Goh

1

Cellular metabolism

A curated list of preprints related to cellular metabolism at Biorxiv by Pablo Ranea Robles from the Prelights community. Special interest on lipid metabolism, peroxisomes and mitochondria.

 



List by Pablo Ranea Robles