Evolution of mouse circadian enhancers from transposable elements
Preprint posted on November 10, 2020 https://www.biorxiv.org/content/10.1101/2020.11.09.375469v1
Most, if not all, genomes are inhabited by transposable elements (TEs) or their remnants. TEs are genetic elements that can mobilize from one genomic locus into another via “cut-and-paste” or “copy-and-paste” mechanisms. TE mobility can bring about undesirable consequences for the host organism, for example genome instability, or deregulation of endogenous gene expression when insertions occur at or near endogenous genes. However, many other studies support important roles of TEs in the evolution of genomes and gene expression. In these cases, TEs are considered to undergo a “domestication” process whereby they acquire a function beneficial for their host.
One prominent example of TE domestication in animals is the repurposing of specific TE families into cis regulatory regions, like enhancers, that are bound by transcription factors and regulate endogenous gene expression. Some TEs have their own “ready-made” cis regulatory sequences, which are recognized and bound by endogenous transcription factors, allowing for transcription by RNA Polymerase II. It is possible to envision an alternative model whereby TEs lacking “ready-made” motifs recognized by endogenous transcription factors acquire cis regulatory potential. Regulatory activity can be acquired de novo, by mutation of an ancestral pre-motif to a motif more efficiently bound by transcription factors. Examples of the latter model are lacking. In a new preprint, Judd and colleagues show how the RSINE1 TE family was repurposed as enhancers of circadian gene expression after mutation of ancestral pre-motifs.
To understand whether TEs are in cis regulators of the mammalian circadian gene regulatory network, the authors decided to focus on the mouse liver. The mouse liver circadian gene regulatory network is deeply conserved and well characterized, with many publicly available datasets. First, analysis of ChIP-seq datasets of the six core circadian regulators (CRs; namely CLOCK, BMAL1, CRY1, CRY2, PER1 and PER2) showed an 8%-14% overlap of the ChIP-seq peaks with TEs or other DNA repeats (not counting low complexity and simple repeats). A temporal analysis of CR binding showed that repeat-derived peaks display an oscillatory profile similar to CR peaks not derived from repeats.
As there are many types of TEs, the authors then interrogated which TEs are enriched in the repeat-derived peaks. The repeat-derived CR binding sites were enriched with TEs of a particular family: RSINE1. SINEs are short interspersed nuclear elements typically derived from RNA Pol III transcripts. The CR-bound RSINE1 elements displayed patterns of DNase hypersensitivity, H3K27 acetylation and RNA Pol II occupancy typical of active enhancers.
RSINE1s have E-Box motifs, each of which differ by 1-2 nucleotides from the optimal motifs bound by CRs. Two of these E-box motifs are present uniquely in RSINE1s, but not in closely related SINEs. These two E-box motifs of RSINE1s are in tandem, creating an optimal binding site for critical CRs. Interestingly, the mutations required in these E-boxes to produce an optimal CR-binding site are C-T mutations, which arise often as a result of deamination in methylated DNA (more on that below). CR-bound RSINE1s tended to have the optimal CR-binding motif, when compared to RSINE1 elements not bound by CRs. Phylogenetic analysis of RSINE1s suggests that the ancestral RSINE1 sequence had a proto-motif, not the optimal motif, suggesting that RSINE1s had to acquire CR binding by mutation and were not a “ready-made” CR-binding motif.
RSINE1s co-option is context- and lineage-dependent. Indeed, RSINE1s that insert close to pre-existing CR-binding sites are more likely to evolve a perfect CR-binding motif and to be co-opted as circadian enhancers. Luciferase assays with RSINE1 elements supported enhancer activity and context-dependency as their flanking regions in the genome attenuated RSINE1s enhancer activity. Importantly, the evolved consensus of RSINE1 matching the preferential CR-binding motifs strongly enhanced luciferase expression. Lastly, the authors demonstrated that RSINE1-derived binding sites tend to be mouse specific, whereas non-TE-derived CR-binding sites are generally more deeply conserved.
What I like about this preprint
It is currently well established that TEs are often domesticated and provide in cis regulation to host genes. However, the process of domestication has remained a bit of a black box. This work brilliantly dissects how a specific TE family becomes domesticated in the circadian gene regulatory circuit in the mouse liver. The authors put forward a model for this (Figure 1). RSINE1s bearing motifs related to those of CR-binding sites spread across the genome. Pre-existing binding sites of CRs make favorable genomic regions for RSINE1 insertions to mature into circadian enhancers by evolving better binding sites for CRs from their proto-motifs. I found it fascinating that the single substitutions required to make two of the consensus RSINE1 E-boxes into a CR-bound consensus E-box were C-T substitutions in a CpG context. These mutations can arise as a product of deamination of methylated DNA. As TEs are often methylated by host defense pathways in the germline, this observation draws a likely evolutionary path.
Figure 1. Model of RSINE1 integration as circadian enhancers. Figure 6B in the preprint, made available under a CC-BY 4.0 International license. CRs, circadian regulators; NRs, nuclear receptors; TF, transcription factor; Ac, acetylation.
All in all it’s great work, but do not take it from me. Go on and read the preprint and check the first author’s nice Twitter thread on his work (https://twitter.com/judd_julius/status/1326218161209929729).
Questions to the authors
- Are RSINE 1 elements still mobile? Do you think spurious RSINE1 insertion in specific loci and consequent gene deregulation could contribute to liver disease?
- Circadian regulation occurs in several tissues and you did look at BMAL1 ChIP-seq in liver, heart and kidney, but saw that RSINE1 elements were overlapping with the BMAL1 peaks mostly in liver. Why do you think RSINE1 elements are only integrating into the circadian gene regulatory network in the liver? As the circadian regulators are similarly expressed in other tissues, would this be in line with chromatin more permissive for RSINE1 co-option in the liver?
- Do you expect other lineage-specific TEs (perhaps other SINE families) to be entwined in the circadian gene regulatory network in humans?
Want to know more?
Regulatory activities of transposable elements: from conflicts to benefits, Chuong et al., 2017.
A field guide to eukaryotic transposable elements, Wells & Feschotte, 2020.
Transcriptional architecture of the mammalian circadian clock, Takahashi, 2017. https://www.nature.com/articles/nrg.2016.150/
Posted on: 15th December 2020 , updated on: 16th December 2020Read preprint
Also in the genomics category:
Dissecting Mammalian Spermatogenesis Using Spatial Transcriptomics
|Selected by||Martin Estermann|
EccDNA formation is dependent on MMEJ, repressed by c-NHEJ pathway, and stimulated by DNA double-strand break
Association analysis of repetitive elements and R-loop formation across species
preListsgenomics category:in the
EMBL Conference: From functional genomics to systems biology
Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020
|List by||Jesus Victorino|
Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20
|List by||Maiko Kitaoka, Madhuja Samaddar, Miguel V. Almeida, Sejal Davla, Jennifer Ann Black, Dey Lab|
A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.
|List by||Shikha Nayar|