Acorde: unraveling functionally-interpretable networks of isoform co-usage from single cell data
Preprint posted on May 09, 2021 https://www.biorxiv.org/content/10.1101/2021.05.07.441841v1
Arzalluz-Luque et al. present acorde, a computational pipeline that integrates bulk long read and single-cell short read RNA-seq to quantify isoform co-expression and co-usage networks at single-cell resolution.Bobby Ranjan
There are a number of relevant questions regarding the importance of splicing for cell identity and function that can only be resolved by evaluating isoform expression at the single-cell level.
Integrating alternative splicing (AS) and gene expression changes has led to the discovery of cell subtypes and states that were otherwise not detected, thus hinting at the presence of coherent isoform variation. Further, co-expression relationships between transcript variants from different genes have not yet been investigated.
However, the uncertainty of short read-based isoform quantification and the heavy 3’ end bias of popular scRNA-seq methods has made investigating alternative splicing and isoform expression dynamics a challenging task. Although long read sequencing is able to alleviate these issues, its intrinsic sequencing depth constraints result in limited isoform diversity.
In this preprint, Arzalluz-Luque et al. present acorde – an end-to-end, data-intensive pipeline that integrates bulk long-reads and scRNA-seq to quantify and analyse isoform expression at single-cell resolution. They applied this pipeline to study the mouse primary visual cortex, using published scRNA-seq Smart-Seq2 (Tasic et al.) and bulk ENCODE PacBio long-read (Wyman et al.) data.
1. acorde provides an end-to-end computational pipeline to quantify and analyse isoform expression at single-cell resolution
acorde employs a hybrid strategy where bulk long-reads and single-cell short reads are integrated to estimate isoform expression at the single cell level. To alleviate the limitations of extant correlation metrics in the single-cell context, Arzalluz-Luque et al. developed a novel strategy to obtain noise-robust correlation estimates in scRNA-seq data, and a semiautomated clustering approach to detect modules of co-expressed isoforms across cell types (together known as the percentile correlation-based clustering approach). The authors additionally re-defined and implemented Differential Isoform Usage (DIU) and coDifferential Isoform Usage (coDIU) analyses in order to leverage the multiple cell types contained in single-cell datasets. Finally, they incorporated a functional annotation step in which several databases and prediction tools were integrated to add isoform-specific functional information
2. Percentile correlation-based clustering outperforms existing correlation and ρ proportionality metrics.
The percentile correlation-based clustering proposed as part of acorde was benchmarked against Pearson, Spearman and zero-inflated Kendall correlations, and the ρ proportionality metric. Of the 5 strategies compared, ρ proportionality came closest to the percentile correlation, but failed to control for unclustered transcripts (Figure 2).
3. Isoform selection exhibits cell-type-specific variation
To quantify the expression of the long read-defined isoforms at the single-cell level, the authors applied acorde to study the mouse primary visual cortex using published scRNA-seq Smart-Seq2 (Tasic et al.) and bulk ENCODE PacBio long-read (Wyman et al.) data. Interestingly, the number of coDIU genes linking isoform co-expression clusters was dependent on cluster sizes, but showed no direct relationship with the similarities between expression profiles, suggesting that coordinated isoform usage mechanisms may produce strong cell type-level shifts in isoform selection. Indeed, in the Tasic et al. dataset, a high proportion of coDIU interactions were detected for highly expressed isoforms in neural cell types. While isoform clusters with high neuronal expression were among the largest in size, it may be plausible that co-splicing be at the core of primary visual cortical neural function regulation.
4. Isoform co-expression may be post-transcriptionally regulated
Annotating the genes regulated by coDIU revealed a specific enrichment of mitochondrial components, suggesting that coordinated isoform usage may affect oxidation and energy
metabolism. Interestingly, coDIU genes also showed additional enrichment for splicing-related terms such as RNA splicing, mRNA splicing via spliceosome and for 3’ UTR motif K-box. This result links genes involved in splicing and RNA stability with the coordination of AS, and suggests that co- expression of alternative isoforms is a post-transcriptionally regulated process.
5. coDIU genes demonstrate potential cell-type-specific splicing-mediated functional synergy
The authors then focused on coDIU genes representing 3 clusters of isoforms: oligodendrocyte- specific, neuron-specific and shared isoform expression patterns.
They found that the K-box motif, which has been proposed as a negative post-transcriptional regulator, presented inclusion changes in ~60% of annotated coDIU genes. In addition, the coDIU network included several genes in which 3’UTR elongation led to neuron-specific co-inclusion of K- box motifs, some of which may be involved in neuron survival and differentiation. This suggests a 3’ UTR binding-mediated mechanism favouring isoform co-expression may regulate post- transcriptional modifications of neuron survival genes.
The majority of neuron-oligodendrocyte coDIU genes also presented coding region variation – protein domains (PFAM) and post-translational modifications (PTMs). Two tubulin isotypes, Tubg2 and Tubb4b, had co-expressed isoforms with neuron-specific and neuron-oligodendrocyte expression, respectively. Both genes presented inclusion changes in an N-terminal GTP-ase domain and several PTMs with differing functional outcomes, suggesting a cell-type-specific fine-tuning mechanism for modifying tubulin stability and its interactions with other proteins (the “tubulin code”).
Why I chose to highlight this preprint
This preprint tackles an important challenge in the single-cell field – the analysis of isoform variation and co-expression at single-cell resolution. acorde provides a robust solution to quantify isoform variation by mapping on to reference long reads. The percentile correlation-based approach provides a novel solution for tackling the noise in scRNA-seq correlations. This preprint demonstrates the relevance and capabilities of acorde in the analysis of isoform co-usage at single- cell resolution.
Questions for the authors
- Has acorde been tested on 10X Genomics 5’ or 3’ scRNA-seq data? How does its performance compare to Smart-Seq2 data?
- What is the false-positive rate for percentile correlations, and does it suffer from spurious correlations as compared to traditional approaches?
- Can acorde be used to perform a case-control analysis to study disease-specific isoform variation?
Posted on: 25th May 2021Read preprint
Also in the bioinformatics category:
Saturation variant interpretation using CRISPR prime editing
|Selected by||Jeffrey Calhoun|
SARS-CoV-2 Variants are Selecting for Spike Protein Mutations that Increase Protein Stability
|Selected by||Soni Mohapatra|
Single-cell transcriptome analysis of embryonic and adult endothelial cells allows to rank the hemogenic potential of post-natal endothelium
|Selected by||Bobby Ranjan|
Also in the genomics category:
Origin, specification and differentiation of a rare supporting-like lineage in the developing mouse gonad
|Selected by||Martin Estermann|
Rapid redistribution and extensive binding of NANOG and GATA6 at shared regulatory elements underlie specification of divergent cell fates
|Selected by||María Mariner-Faulí|
Lightning Fast and Highly Sensitive Full-Length Single-cell sequencing using FLASH-Seq
|Selected by||Jennifer Ann Black|
preListsbioinformatics category:in the
The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!
|List by||Osvaldo Contreras|
Single Cell Biology 2020
A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.
|List by||Alex Eve|
Antimicrobials: Discovery, clinical use, and development of resistance
Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.
|List by||Zhang-He Goh|
Also in the genomics category:
20th “Genetics Workshops in Hungary”, Szeged (25th, September)
In this annual conference, Hungarian geneticists, biochemists and biotechnologists presented their works. Link: http://group.szbk.u-szeged.hu/minikonf/archive/prg2021.pdf
|List by||Nándor Lipták|
EMBL Conference: From functional genomics to systems biology
Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020
|List by||Jesus Victorino|
Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20
|List by||Maiko Kitaoka et al.|
A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.
|List by||Shikha Nayar|