Association analysis of repetitive elements and R-loop formation across species

Chao Zeng, Masahiro Onoguchi, Michiaki Hamada

Preprint posted on November 10, 2020

Article now published in Mobile DNA at

Looped in the repeats: correlating R-loops with repetitive genetic elements.

Selected by Ram


R-loops are non-canonical three-stranded nucleic acid structures that are formed when the RNA hybridizes with the complementary DNA strand displacing the other strand free. The factors that influence the formation and genome-wide distribution of R-loops include several proteins involved in transcription, splicing, replication, recombination, DNA repair, and chromatin modifiers, etc. Additionally, R-loops tend to form at repetitive elements and skewed sequences (GC-skew and AT-skew). However, it is not clear if R-loops have any sequence bias in their genome-wide distribution among different species. Therefore, the authors of the current preprint looked into published datasets to understand the cis-regulatory elements associated with genome-wide R-loop distribution.

Schematic representation of the datasets used in the current preprint. Self-made using

Key findings

  1. The authors reanalyzed publicly available datasets generated in human cells (U2OS), fly (D. melanogaster embryos, S2 cells), and plants (seedling of A. thaliana) using different controls. They used R-loop (DRIP-seq) and nascent RNA profiles (GRO-seq) for the study. They observed that R-loops in plants tend to be longer (~998 nucleotides) than humans and fly (~414-618 nucleotides). Across the species, R-loops tend to enrich at gene promoters. Of note, plants harbor about 60% of the total R-loops at their promoters (and 0.2% at their introns). They also found a 70%, 24%, 39%, and 54% overlap between R-loops and transcribing regions in humans, fly embryos, S2 cells, and plants. However, when they analyzed further, flies tend to harbor R-loops more at intergenic regions (~90%) and possibly independent of transcription, suggesting the presence of trans R-loops.
  2. The authors report some species-specific differences. They show that human and plant R-loops were marginally enriched at ribosomal DNA and underrepresented at short interspersed nuclear elements (SINEs). But more enriched at retrotransposons and satellite DNA. This is in contrary to the fly genome that has underrepresented R-loops at the satellite DNA (they also notice some difference in R-loop genome-wide distribution in the fly genome between embryos and S2 cells, possibly reflecting the developmental stages).
  3. Overall, all the species analyzes showed a positive correlation between repetitive genetic elements and R-loop genome-wide distribution. In human cells, telomeres, centromeres, ribosomal DNA, and retrotransposons are enriched for R-loops. In the fly genome, Long interspersed nuclear elements (LINEs), Long terminal repeats (LTRs), and low complexity regions enriched for R-loops. However, in the plant genome, about half of the repeat families were enriched in R-loops; these include LINEs, LTRs, and low complexity regions, etc.


Either a cause or consequence, R-loops seem to play a crucial role in developmental pathways, cancer progression, and neurodegenerative diseases. Thus, many researchers are drawn to understand their precise physiological role. While most of the work in R-loop biology looked at trans-acting factors, here, the authors investigated the association of cis-regulatory elements or sequence determinants of R-loop formation. The authors found strong correlations between R-loops and repetitive DNA sequences reinforcing earlier studies.

(Note: I only highlighted the key findings of the preprint without commenting on the methodology. Anyone is free to comment on the methodology, in case the preprint excites you.)

Acknowledgments: I am thankful to all the authors for their support, especially Chao Zeng for taking the time to comment on the preLight.




Tags: genomics, repetitive elements, rloops

Posted on: 28th December 2020 , updated on: 22nd January 2021


Read preprint (2 votes)

Author's response

Chao Zeng (CZ) and others shared

1. Cancer cells may have a different set of aberrant R-loop pool. Why did the authors choose to use cancer cells (U2OS) for their analysis, rather than normal or transformed cells (like HEK293, IMR90, etc.)?  Do the authors think it would be a better strategy to complement their data?

CZ and others: We chose U2OS because the public R-loop data satisfying the stringent criteria (i.e., having biological replicates, INPUT, and RNaseH-treated samples) is limited. Our next work considers lowering the requirements to analyze more publicly available data, comparing the R-loop formation differences among various cell types (including normal or transformed cells).

2. Do lesser R-loops in plant introns reflect the average length of introns or the transcription rate of those genes?

CZ and others: Yes, we consider that it may be related to intron size and transcription rate. In addition, more R-loops in plant promoters may also reflect a preference for plant R-loops to form in the transcription initiation regions. Controlled experiments are required to be conducted to test the above hypotheses.

3. The authors smartly used nascent RNA profiles (GRO-seq) for gene expression analysis. This may reveal any trans R-loops present in the genome, as the authors suggested. But do the authors think if they combine this analysis with a steady-state RNA profiles (RNA-seq), they may be able to ascertain their hypothesis?

CZ and others: Although we did not use RNA-seq data in this analysis, it would certainly be interesting to consider both GRO-seq and RNA-seq in future studies.

4. Did the authors find any specific mutational signatures associated with R-loops?

CZ and others: This study focuses on the relationship between repeat elements and R-loop formation, so we have not analyzed mutational signatures. Another project we are working on is to study the impact of R-loop dynamics in human diseases (e.g., cancer, neurodegenerative diseases). Finding mutational signatures in disease-related R-loops is one of the goals we are considering.

5. What do the authors think about using R-loop genome-wide distributions that can reveal changes in R-loops at repetitive sequences (for example4)?

CZ and others: The role of repetitive sequences and R-loops in pathogenesis provides us with new perspectives on human diseases. Our study shows a strong correlation between repetitive sequences and R-loops. Hence, we hypothesize that some repetitive sequences may act as regulatory components to modulate R-loop dynamics. Studying the relationship between diseases and R-loops at repetitive sequences will be a promising research direction.

6. The presence of R-loops at intergenic regions in the fly genome could reflect an earlier study (5). It would be interesting to hear the authors’ comments.

CZ and others: A large number of R-loops in intergenic regions suggest that trans R-loops are prevalent in the fly genome. For example, these trans R-loops in enhancers may be involved in enhancer-promoter looping and chromatin dynamics. Notably, many lncRNAs (long non-coding RNAs) whose functions are still unclear may participate in the formation of trans R-loops. To accelerate the study of the formation and regulatory roles of R-loops in intergenic regions, experimental techniques that can precisely detect trans R-loops are expected to be developed. The proximity ligation technique for preparing RNA-DNA chimeric sequences as in Sridhar B, 2017 (6) is a promising solution.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here