Association analysis of repetitive elements and R-loop formation across species

Chao Zeng, Masahiro Onoguchi, Michiaki Hamada

Preprint posted on 10 November 2020

Article now published in Mobile DNA at

Looped in the repeats: correlating R-loops with repetitive genetic elements.

Selected by Sree Rama Chaitanya


R-loops are non-canonical three-stranded nucleic acid structures that are formed when the RNA hybridizes with the complementary DNA strand displacing the other strand free. The factors that influence the formation and genome-wide distribution of R-loops include several proteins involved in transcription, splicing, replication, recombination, DNA repair, and chromatin modifiers, etc. Additionally, R-loops tend to form at repetitive elements and skewed sequences (GC-skew and AT-skew). However, it is not clear if R-loops have any sequence bias in their genome-wide distribution among different species. Therefore, the authors of the current preprint looked into published datasets to understand the cis-regulatory elements associated with genome-wide R-loop distribution.

Schematic representation of the datasets used in the current preprint. Self-made using

Key findings

  1. The authors reanalyzed publicly available datasets generated in human cells (U2OS), fly (D. melanogaster embryos, S2 cells), and plants (seedling of A. thaliana) using different controls. They used R-loop (DRIP-seq) and nascent RNA profiles (GRO-seq) for the study. They observed that R-loops in plants tend to be longer (~998 nucleotides) than humans and fly (~414-618 nucleotides). Across the species, R-loops tend to enrich at gene promoters. Of note, plants harbor about 60% of the total R-loops at their promoters (and 0.2% at their introns). They also found a 70%, 24%, 39%, and 54% overlap between R-loops and transcribing regions in humans, fly embryos, S2 cells, and plants. However, when they analyzed further, flies tend to harbor R-loops more at intergenic regions (~90%) and possibly independent of transcription, suggesting the presence of trans R-loops.
  2. The authors report some species-specific differences. They show that human and plant R-loops were marginally enriched at ribosomal DNA and underrepresented at short interspersed nuclear elements (SINEs). But more enriched at retrotransposons and satellite DNA. This is in contrary to the fly genome that has underrepresented R-loops at the satellite DNA (they also notice some difference in R-loop genome-wide distribution in the fly genome between embryos and S2 cells, possibly reflecting the developmental stages).
  3. Overall, all the species analyzes showed a positive correlation between repetitive genetic elements and R-loop genome-wide distribution. In human cells, telomeres, centromeres, ribosomal DNA, and retrotransposons are enriched for R-loops. In the fly genome, Long interspersed nuclear elements (LINEs), Long terminal repeats (LTRs), and low complexity regions enriched for R-loops. However, in the plant genome, about half of the repeat families were enriched in R-loops; these include LINEs, LTRs, and low complexity regions, etc.


Either a cause or consequence, R-loops seem to play a crucial role in developmental pathways, cancer progression, and neurodegenerative diseases. Thus, many researchers are drawn to understand their precise physiological role. While most of the work in R-loop biology looked at trans-acting factors, here, the authors investigated the association of cis-regulatory elements or sequence determinants of R-loop formation. The authors found strong correlations between R-loops and repetitive DNA sequences reinforcing earlier studies.

(Note: I only highlighted the key findings of the preprint without commenting on the methodology. Anyone is free to comment on the methodology, in case the preprint excites you.)

Acknowledgments: I am thankful to all the authors for their support, especially Chao Zeng for taking the time to comment on the preLight.




Tags: genomics, repetitive elements, rloops

Posted on: 28 December 2020 , updated on: 22 January 2021


Read preprint (3 votes)

Author's response

Chao Zeng (CZ) and others shared

1. Cancer cells may have a different set of aberrant R-loop pool. Why did the authors choose to use cancer cells (U2OS) for their analysis, rather than normal or transformed cells (like HEK293, IMR90, etc.)?  Do the authors think it would be a better strategy to complement their data?

CZ and others: We chose U2OS because the public R-loop data satisfying the stringent criteria (i.e., having biological replicates, INPUT, and RNaseH-treated samples) is limited. Our next work considers lowering the requirements to analyze more publicly available data, comparing the R-loop formation differences among various cell types (including normal or transformed cells).

2. Do lesser R-loops in plant introns reflect the average length of introns or the transcription rate of those genes?

CZ and others: Yes, we consider that it may be related to intron size and transcription rate. In addition, more R-loops in plant promoters may also reflect a preference for plant R-loops to form in the transcription initiation regions. Controlled experiments are required to be conducted to test the above hypotheses.

3. The authors smartly used nascent RNA profiles (GRO-seq) for gene expression analysis. This may reveal any trans R-loops present in the genome, as the authors suggested. But do the authors think if they combine this analysis with a steady-state RNA profiles (RNA-seq), they may be able to ascertain their hypothesis?

CZ and others: Although we did not use RNA-seq data in this analysis, it would certainly be interesting to consider both GRO-seq and RNA-seq in future studies.

4. Did the authors find any specific mutational signatures associated with R-loops?

CZ and others: This study focuses on the relationship between repeat elements and R-loop formation, so we have not analyzed mutational signatures. Another project we are working on is to study the impact of R-loop dynamics in human diseases (e.g., cancer, neurodegenerative diseases). Finding mutational signatures in disease-related R-loops is one of the goals we are considering.

5. What do the authors think about using R-loop genome-wide distributions that can reveal changes in R-loops at repetitive sequences (for example4)?

CZ and others: The role of repetitive sequences and R-loops in pathogenesis provides us with new perspectives on human diseases. Our study shows a strong correlation between repetitive sequences and R-loops. Hence, we hypothesize that some repetitive sequences may act as regulatory components to modulate R-loop dynamics. Studying the relationship between diseases and R-loops at repetitive sequences will be a promising research direction.

6. The presence of R-loops at intergenic regions in the fly genome could reflect an earlier study (5). It would be interesting to hear the authors’ comments.

CZ and others: A large number of R-loops in intergenic regions suggest that trans R-loops are prevalent in the fly genome. For example, these trans R-loops in enhancers may be involved in enhancer-promoter looping and chromatin dynamics. Notably, many lncRNAs (long non-coding RNAs) whose functions are still unclear may participate in the formation of trans R-loops. To accelerate the study of the formation and regulatory roles of R-loops in intergenic regions, experimental techniques that can precisely detect trans R-loops are expected to be developed. The proximity ligation technique for preparing RNA-DNA chimeric sequences as in Sridhar B, 2017 (6) is a promising solution.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the genomics category:

Also in the molecular biology category:

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23


List by Alex Eve

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.


List by Sergio Menchero et al.

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.


List by Nadja Hümpfer et al.

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.


List by Alex Eve

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020


List by Ana Dorrego-Rivas

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)


List by Madhuja Samaddar et al.

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.


List by Rob Hynds


This list of preprints is focused on work expanding our knowledge on mitochondria in any organism, tissue or cell type, from the normal biology to the pathology.


List by Sandra Franco Iborra