Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 367,988 people using natural language processing
Posted on: 4 June 2026
Preprint posted on 27 February 2026
AI filtering of brain imaging reports uncovers covert cardiovascular diseases in Scottish NHS cohort
Selected by Rafidah Mumtahinah Chowdhury, Maya Belleville, Marianne Rustom, uMontreal Neuro preLightersCategories: neuroscience, pathology
Background
As brain imaging has become more widely used, vascular abnormalities are increasingly detected in patients without clear neurological symptoms. Among these, covert cerebrovascular disease (CCD), including white matter changes, lacunes, and silent infarcts, has emerged as a potential marker of underlying small-vessel pathology. A central question in the medical field is whether these findings represent clinically meaningful warning signs of future neurological disease, or whether they simply reflect the cumulative effects of aging and comorbidity.
To date, much of the evidence linking CCD to outcomes such as stroke and dementia has come from research-based neuroimaging cohorts using standardized MRI protocols. While informative, these studies are conducted in selected populations and may not reflect the realities of routine clinical care, where imaging is performed for heterogeneous indications and is often CT-based. In addition, incidental cerebrovascular findings are typically documented in free-text radiology reports rather than captured as structured data, limiting their visibility in large-scale epidemiological studies. This creates an important gap between what is observed in controlled research settings and what is encountered in everyday clinical practice.
In this context, the authors of this preprint aim to determine whether CCD identified through routine radiology reports carries independent prognostic significance for future neurological disease. Using natural language processing applied to a large population-based imaging cohort, this study tests the hypothesis that clinically reported vascular brain lesions are not merely incidental findings but instead mark an increased risk of subsequent stroke and dementia. At the same time, it raises the broader question of how such findings should be interpreted and acted upon in clinical practice. Alongside this, the use of an LLM for such a large cohort allows to grasp the potential of this technology in other biological domains with large clinical or animal cohorts, particularly in the case of multi-center studies or large sample sizes, in order to effectively analyze qualitative data for larger statistical power.
Key Findings
This study analyzed MRI and CT records from over 360,000 patients within the Scottish National Health Service. This clinical population was evaluated using a Large Language Model to efficiently identify terms (“ischemic”, “stroke”, “lacune”, etc.) relevant to the four CCD phenotypes. Patient files were evaluated at 1-year and 5-year time points after imaging reports, in order to have sufficient time to have a diagnosis.
People presenting CCD or atrophy phenotypes had higher chances of experiencing stroke, dementia, and Parkinson’s disease.
Stroke risk was significantly higher in cortical infarct (11%), lacune (9%) and cerebral atrophy (6%) phenotypes. In addition, dementia risk was significantly higher in WMH (23%), cortical infarct (22%) and lacune (22%) phenotypes (preprint Table 2).
All CCD phenotypes show higher hazard ratios for stroke.
Adjusted hazard ratios (aHR) and 95% confidence intervals (CI) after a 12-year follow-up period indicated that each phenotype (vs. no phenotype) had a higher risk of any kind of stroke (preprint Figure 3). In particular, cortical infarct phenotypes were most associated with ischaemic (aHR: 1.9) as well as haemorrhagic stroke (aHR 1.7), lacune phenotype with haemorrhagic stroke (aHR: 1.6), and cerebral atrophy with unspecified stroke (aHR: 1.2) (preprint Figure 4). Combinations of two or more phenotypes correlated with a higher aHR in all strokes, with higher risks associated with all 4 phenotypes combined.
Cerebral atrophy, white matter hyperintensities and cortical infarct phenotypes show higher hazard ratios for dementia.
Chance of dementia was significantly higher in cerebral atrophy (aHR: 1.7), WMH (aHR: 1.3) and cortical infarct (aHR: 1.1) phenotypes (preprint Figure 3). In particular, cerebral atrophy was associated with higher risk of all dementia types: Alzheimer’s disease (aHR: 1.9), vascular dementia (aHR: 1.4) and unspecified dementia (aHR: 1.5). Cortical infarct or lacune phenotypes were associated with higher risk of vascular dementia (cortical infarct – aHR: 1.8; lacune – aHR: 1.6), but a lower chance of Alzheimer’s disease (cortical infarct – aHR: 0.7; lacune – aHR: 0.7). WMH phenotype was associated with vascular (aHR: 2.0) and unspecified (aHR: 1.3) dementias (Figure 4). Similar to strokes, combinations of two or more phenotypes correlated with higher aHR in all dementias and showed highest risks regarding all four phenotypes combined.
Cerebral atrophy, white matter hyperintensities and cortical infarct phenotypes change in hazard ratios for Parkinson’s disease.
In the case of Parkinson’s disease, aHRs were lower in relation to cortical infarct phenotype (aHR: 0.7) and higher in relation to cerebral atrophy (aHR: 1.4) and WMH (aHR: 1.1) phenotypes (Figure 3). The cumulative effect of several phenotypes was not observed for a diagnosis of Parkinson’s disease.
No significant effects were found for epilepsy or colorectal cancer diagnoses.
Why we highlight this preprint
We choose to highlight this observational retrospective longitudinal cohort study for three main reasons:
- While much cerebrovascular research has traditionally focused on overt clinical events such as stroke, increasing attention is now being directed toward covert cerebrovascular disease (CCD), including white matter changes and silent infarcts. This study contributes to this shift by evaluating the long-term clinical relevance of these often-incidental findings in a large, real-world population sample.
- We were particularly struck by the use of natural language processing (NLP), a fairly novel approach, to extract imaging phenotypes from unstructured radiology reports at scale. This approach enables the study of clinically relevant but otherwise uncoded features, opening new possibilities for leveraging routine healthcare data in epidemiological research.
- The scale of this study, including over 360,000 individuals from routine clinical care, provides a level of statistical power and ecological validity that is rarely achieved in neuroimaging research, which is often limited to smaller, highly selected cohorts.
Importantly, the study addresses a common clinical dilemma that is personally important to us: how to interpret incidental findings, such as white matter hyperintensities or cerebral atrophy, on imaging. By linking these findings to future neurological outcomes, the work begins to provide valuable context for clinicians encountering these reports in practice. In particular, incidental findings have been valuable throughout the history of science and allowed for incredible discoveries, including penicillin, but can be notoriously difficult to statistically define in the case of large sample-size studies, as in the case of one of authors, currently doing research in epidemiological neuroscience with large-scale metrics. This method of analyzing incidental findings and comptabilizing them into quantitative, interpretable data is worth adding to the arsenal of tools across all sciences.
Questions for the authors
- Ischemic events can correlate with or even be causal to epilepsy. We were wondering if other brain-based controls that are less related to stroke could be used for the same analyses, such as primary brain tumors, in order to isolate as much as possible stroke-related vs. other brain events?
- While the study adjusts for several key variables, important vascular and lifestyle factors such as smoking, obesity, and blood pressure were not available. Given their strong association with both CCD and outcomes like stroke and dementia, how might this residual confounding influence your hazard ratio estimates? Do you think incorporating these variables in future work could meaningfully change the observed associations, particularly in distinguishing whether CCD is an independent risk factor or a proxy for broader vascular risk?
- The study uses Cox proportional hazards models, which assume constant effects over time. Although this assumption was tested using Schoenfeld residuals, the diagnostics are not shown, making it difficult to assess its validity for CCD variables. Notably, sensitivity analyses suggest that some associations attenuate or reverse after excluding early follow-up, indicating potential time-dependent effects. Could you comment on whether CCD effects varied over time, and whether alternative approaches (ex. time-varying coefficients or piecewise models) were considered to better capture this?
Sign up to customise the site to your preferences and to receive alerts
Register hereAlso in the neuroscience category:
EBV reprograms autoreactive anti-CNS B cells as antigen presenting cells in multiple sclerosis
Léa Bastien et al.
The Endocannabinoid System’s Contribution to Placebo Analgesia
Thomas Nicodemo Arrieta et al.
Generalization and extinction of learned fear alter primary sensory input to the brain
Kathleen Ngo et al.
Also in the pathology category:
EBV reprograms autoreactive anti-CNS B cells as antigen presenting cells in multiple sclerosis
Léa Bastien et al.
Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 367,988 people using natural language processing
Rafidah Mumtahinah Chowdhury et al.
Snake venom metalloproteinases are predominantly responsible for the cytotoxic effects of certain African viper venoms
Daniel Osorno Valencia
preLists in the neuroscience category:
preLighters’ choice – Handpicked DevBio preprints
preLighters with expertise across developmental and stem cell biology have nominated a few developmental biology (and related) preprints they’re excited about and explain in a few paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.
| List by | Theodora Stougiannou et al. |
BSDB Spring Meeting: Molecules to Morphogenesis
The British Society for Developmental Biology (BSDB) Spring Meeting Molecules to Morphogenesis was held from 23–26 March 2026 at the University of Warwick (UK). This meeting brought together a vibrant community of researchers to discuss how molecular mechanisms are integrated across scales to drive morphogenesis, spanning diverse model systems and approaches. This preList contains preprints by presenters from the talk and poster sessions at the meeting. Please do get in touch at preLights@biologists.com if you notice any relevant preprints that we may have missed.
| List by | Ingrid Tsang |
Keystone Symposium on Stem Cell Models in Embryology 2026
The Keystone Symposium on Stem Cell Models in Embryology, 2026, was organised by Jun Wu (UT Southwestern), Jianping Fu (University of Michigan) and Miki Ebisuya (TU Dresden) and held at Asilomar Conference Grounds in California (US). The meeting discussed recent advances made in establishing stem-cell-based embryo models, what fundamental insights into developmental processes have been gleaned from them, as well as how they are beginning to be applied more widely. This prelist contains preprints by presenters at the talk and poster sessions at the conference, which our Reviews Editor in attendance spotted. Please do reach out to preLights@biologists.com if you notice any that we’ve missed.
| List by | Ingrid Tsang |
November in preprints – DevBio & Stem cell biology
preLighters with expertise across developmental and stem cell biology have nominated a few developmental and stem cell biology (and related) preprints posted in November they’re excited about and explain in a single paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.
| List by | Aline Grata et al. |
October in preprints – DevBio & Stem cell biology
Each month, preLighters with expertise across developmental and stem cell biology nominate a few recent developmental and stem cell biology (and related) preprints they’re excited about and explain in a single paragraph why. Short, snappy picks from working scientists — a quick way to spot fresh ideas, bold methods and papers worth reading in full. These preprints can all be found in the October preprint list published on the Node.
| List by | Deevitha Balasubramanian et al. |
October in preprints – Cell biology edition
Different preLighters, with expertise across cell biology, have worked together to create this preprint reading list for researchers with an interest in cell biology. This month, most picks fall under (1) Cell organelles and organisation, followed by (2) Mechanosignaling and mechanotransduction, (3) Cell cycle and division and (4) Cell migration
| List by | Matthew Davies et al. |
July in preprints – the CellBio edition
A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: (1) Cell Signalling and Mechanosensing (2) Cell Cycle and Division (3) Cell Migration and Cytoskeleton (4) Cancer Biology (5) Cell Organelles and Organisation
| List by | Girish Kale et al. |
May in preprints – the CellBio edition
A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: 1) Biochemistry/metabolism 2) Cancer cell Biology 3) Cell adhesion, migration and cytoskeleton 4) Cell organelles and organisation 5) Cell signalling and 6) Genetics
| List by | Barbora Knotkova et al. |
April in preprints – the CellBio edition
A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading lists for researchers with an interest in cell biology. This month, categories include: 1) biochemistry/metabolism 2) cell cycle and division 3) cell organelles and organisation 4) cell signalling and mechanosensing 5) (epi)genetics
| List by | Vibha SINGH et al. |
Biologists @ 100 conference preList
This preList aims to capture all preprints being discussed at the Biologists @100 conference in Liverpool, UK, either as part of the poster sessions or the (flash/short/full-length) talks.
| List by | Reinier Prosee, Jonathan Townson |
2024 Hypothalamus GRC
This 2024 Hypothalamus GRC (Gordon Research Conference) preList offers an overview of cutting-edge research focused on the hypothalamus, a critical brain region involved in regulating homeostasis, behavior, and neuroendocrine functions. The studies included cover a range of topics, including neural circuits, molecular mechanisms, and the role of the hypothalamus in health and disease. This collection highlights some of the latest advances in understanding hypothalamic function, with potential implications for treating disorders such as obesity, stress, and metabolic diseases.
| List by | Nathalie Krauth |
‘In preprints’ from Development 2022-2023
A list of the preprints featured in Development's 'In preprints' articles between 2022-2023
| List by | Alex Eve, Katherine Brown |
CSHL 87th Symposium: Stem Cells
Preprints mentioned by speakers at the #CSHLsymp23
| List by | Alex Eve |
Journal of Cell Science meeting ‘Imaging Cell Dynamics’
This preList highlights the preprints discussed at the JCS meeting 'Imaging Cell Dynamics'. The meeting was held from 14 - 17 May 2023 in Lisbon, Portugal and was organised by Erika Holzbaur, Jennifer Lippincott-Schwartz, Rob Parton and Michael Way.
| List by | Helen Zenner |
FENS 2020
A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020
| List by | Ana Dorrego-Rivas |
ASCB EMBO Annual Meeting 2019
A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)
| List by | Madhuja Samaddar et al. |
SDB 78th Annual Meeting 2019
A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.
| List by | Alex Eve |
Autophagy
Preprints on autophagy and lysosomal degradation and its role in neurodegeneration and disease. Includes molecular mechanisms, upstream signalling and regulation as well as studies on pharmaceutical interventions to upregulate the process.
| List by | Sandra Malmgren Hill |
Young Embryologist Network Conference 2019
Preprints presented at the Young Embryologist Network 2019 conference, 13 May, The Francis Crick Institute, London
| List by | Alex Eve |
Also in the pathology category:
preLighters’ choice – Handpicked DevBio preprints
preLighters with expertise across developmental and stem cell biology have nominated a few developmental biology (and related) preprints they’re excited about and explain in a few paragraph why. Concise preprint highlights, prepared by the preLighter community – a quick way to spot upcoming trends, new methods and fresh ideas.
| List by | Theodora Stougiannou et al. |
October in preprints – DevBio & Stem cell biology
Each month, preLighters with expertise across developmental and stem cell biology nominate a few recent developmental and stem cell biology (and related) preprints they’re excited about and explain in a single paragraph why. Short, snappy picks from working scientists — a quick way to spot fresh ideas, bold methods and papers worth reading in full. These preprints can all be found in the October preprint list published on the Node.
| List by | Deevitha Balasubramanian et al. |
October in preprints – Cell biology edition
Different preLighters, with expertise across cell biology, have worked together to create this preprint reading list for researchers with an interest in cell biology. This month, most picks fall under (1) Cell organelles and organisation, followed by (2) Mechanosignaling and mechanotransduction, (3) Cell cycle and division and (4) Cell migration
| List by | Matthew Davies et al. |
Fibroblasts
The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!
| List by | Osvaldo Contreras |
ECFG15 – Fungal biology
Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome
| List by | Hiral Shah |
COVID-19 / SARS-CoV-2 preprints
List of important preprints dealing with the ongoing coronavirus outbreak. See http://covidpreprints.com for additional resources and timeline, and https://connect.biorxiv.org/relate/content/181 for full list of bioRxiv and medRxiv preprints on this topic
| List by | Dey Lab, Zhang-He Goh |
1
Cellular metabolism
A curated list of preprints related to cellular metabolism at Biorxiv by Pablo Ranea Robles from the Prelights community. Special interest on lipid metabolism, peroxisomes and mitochondria.
| List by | Pablo Ranea Robles |






