Close

Acorde: unraveling functionally-interpretable networks of isoform co-usage from single cell data

Angeles Arzalluz-Luque, Pedro Salguero, Sonia Tarazona, Ana Conesa

Preprint posted on May 09, 2021 https://www.biorxiv.org/content/10.1101/2021.05.07.441841v1

Arzalluz-Luque et al. present acorde, a computational pipeline that integrates bulk long read and single-cell short read RNA-seq to quantify isoform co-expression and co-usage networks at single-cell resolution.

Selected by Bobby Ranjan

Categories: bioinformatics, genomics

Background

There are a number of relevant questions regarding the importance of splicing for cell identity and function that can only be resolved by evaluating isoform expression at the single-cell level.

Integrating alternative splicing (AS) and gene expression changes has led to the discovery of cell subtypes and states that were otherwise not detected, thus hinting at the presence of coherent isoform variation. Further, co-expression relationships between transcript variants from different genes have not yet been investigated.

However, the uncertainty of short read-based isoform quantification and the heavy 3’ end bias of popular scRNA-seq methods has made investigating alternative splicing and isoform expression dynamics a challenging task. Although long read sequencing is able to alleviate these issues, its intrinsic sequencing depth constraints result in limited isoform diversity.

In this preprint, Arzalluz-Luque et al. present acorde – an end-to-end, data-intensive pipeline that integrates bulk long-reads and scRNA-seq to quantify and analyse isoform expression at single-cell resolution. They applied this pipeline to study the mouse primary visual cortex, using published scRNA-seq Smart-Seq2 (Tasic et al.) and bulk ENCODE PacBio long-read (Wyman et al.) data.

 

Key Findings

Figure 1. acorde workflow.

1.    acorde provides an end-to-end computational pipeline to quantify and analyse isoform expression at single-cell resolution

acorde employs a hybrid strategy where bulk long-reads and single-cell short reads are integrated to estimate isoform expression at the single cell level. To alleviate the limitations of extant correlation metrics in the single-cell context, Arzalluz-Luque et al. developed a novel strategy to obtain noise-robust correlation estimates in scRNA-seq data, and a semiautomated clustering approach to detect modules of co-expressed isoforms across cell types (together known as the percentile correlation-based clustering approach). The authors additionally re-defined and implemented Differential Isoform Usage (DIU) and coDifferential Isoform Usage (coDIU) analyses in order to leverage the multiple cell types contained in single-cell datasets. Finally, they incorporated a functional annotation step in which several databases and prediction tools were integrated to add isoform-specific functional information

(Figure 1).

2.    Percentile correlation-based clustering outperforms existing correlation and ρ proportionality metrics.

The percentile correlation-based clustering proposed as part of acorde was benchmarked against Pearson, Spearman and zero-inflated Kendall correlations, and the ρ proportionality metric. Of the 5 strategies compared, ρ proportionality came closest to the percentile correlation, but failed to control for unclustered transcripts (Figure 2).

 

Figure 2. Evaluation of percentile correlation-based clustering. From left to right, the metrics used are: (i) mean proportion of pairwise correlations > 0.8, (ii) percentage of unclustered transcripts, (iii – iv) the co-expression metric’s effect on clustering. (iii) mean Jaccard Index (JI), and (iv) standard deviation of JI.

3.    Isoform selection exhibits cell-type-specific variation

To quantify the expression of the long read-defined isoforms at the single-cell level, the authors applied acorde to study the mouse primary visual cortex using published scRNA-seq Smart-Seq2 (Tasic et al.) and bulk ENCODE PacBio long-read (Wyman et al.) data. Interestingly, the number of coDIU genes linking isoform co-expression clusters was dependent on cluster sizes, but showed no direct relationship with the similarities between expression profiles, suggesting that coordinated isoform usage mechanisms may produce strong cell type-level shifts in isoform selection. Indeed, in the Tasic et al. dataset, a high proportion of coDIU interactions were detected for highly expressed isoforms in neural cell types. While isoform clusters with high neuronal expression were among the largest in size, it may be plausible that co-splicing be at the core of primary visual cortical neural function regulation.

4.    Isoform co-expression may be post-transcriptionally regulated

Annotating the genes regulated by coDIU revealed a specific enrichment of mitochondrial components, suggesting that coordinated isoform usage may affect oxidation and energy

metabolism. Interestingly, coDIU genes also showed additional enrichment for splicing-related terms such as RNA splicing, mRNA splicing via spliceosome and for 3’ UTR motif K-box. This result links genes involved in splicing and RNA stability with the coordination of AS, and suggests that co- expression of alternative isoforms is a post-transcriptionally regulated process.

5.    coDIU genes demonstrate potential cell-type-specific splicing-mediated functional synergy

The authors then focused on coDIU genes representing 3 clusters of isoforms: oligodendrocyte- specific, neuron-specific and shared isoform expression patterns.

They found that the K-box motif, which has been proposed as a negative post-transcriptional regulator, presented inclusion changes in ~60% of annotated coDIU genes. In addition, the coDIU network included several genes in which 3’UTR elongation led to neuron-specific co-inclusion of K- box motifs, some of which may be involved in neuron survival and differentiation. This suggests a 3’ UTR binding-mediated mechanism favouring isoform co-expression may regulate post- transcriptional modifications of neuron survival genes.

The majority of neuron-oligodendrocyte coDIU genes also presented coding region variation – protein domains (PFAM) and post-translational modifications (PTMs). Two tubulin isotypes, Tubg2 and Tubb4b, had co-expressed isoforms with neuron-specific and neuron-oligodendrocyte expression, respectively. Both genes presented inclusion changes in an N-terminal GTP-ase domain and several PTMs with differing functional outcomes, suggesting a cell-type-specific fine-tuning mechanism for modifying tubulin stability and its interactions with other proteins (the “tubulin code”).

 

Why I chose to highlight this preprint

This preprint tackles an important challenge in the single-cell field – the analysis of isoform variation and co-expression at single-cell resolution. acorde provides a robust solution to quantify isoform variation by mapping on to reference long reads. The percentile correlation-based approach provides a novel solution for tackling the noise in scRNA-seq correlations. This preprint demonstrates the relevance and capabilities of acorde in the analysis of isoform co-usage at single- cell resolution.

 

Questions for the authors

  1. Has acorde been tested on 10X Genomics 5’ or 3’ scRNA-seq data? How does its performance compare to Smart-Seq2 data?
  2. What is the false-positive rate for percentile correlations, and does it suffer from spurious correlations as compared to traditional approaches?
  3. Can acorde be used to perform a case-control analysis to study disease-specific isoform variation?

 

Posted on: 25th May 2021

doi: https://doi.org/10.1242/prelights.29126

Read preprint (No Ratings Yet)




Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here
Close