Mutation bias shapes gene evolution in Arabidopsis thaliana

J. Grey Monroe, Thanvi Srikant, Pablo Carbonell-Bejerano, Moises Exposito-Alonso, Mao-Lun Weng, Matthew T. Rutter, Charles B. Fenster, Detlef Weigel

Preprint posted on June 18, 2020

Not so random: De novo mutations in Arabidopsis are biased due to cytogenetic features and have shaped gene evolution.

Selected by Facundo Romani


Mutations in the DNA are one of the main drivers of genome evolution in all organisms. These mutations include transitions and transversions (single nucleotide polymorphisms, SNPs), insertions and deletions (INDELS) that could impact regulatory regions and coding sequences and affect the fitness of the organisms. It was observed that mutation rates are influenced by the DNA sequence and epigenetic features in wild populations. However, this mutation bias is affected by strong selection. Lack of studies analysing large de novo mutation catalogues in plants not subject to strong selection limit our knowledge on whether this bias is independent of selection or not. Grey Monroe and colleagues reanalysed a collection of spontaneous mutations in A. thaliana and associated them with cytogenetic features (GC content, DNA methylation, histone marks, chromatin accessibility (ATAC-seq) and gene expression) to generate a regression model and compare it with natural variation.


Figure 2 from the pre-print. (A) Schematic representation of the regression model. (B-C) contribution of different cytogenetic features to the model. (D-E) Comparison of gene-level distribution compared predicted mutation rates and polymorphism in wild populations.


Major findings

The generated model weighed the contribution of each cytogenetic feature in mutation rate. Regions with high GC content had the lowest mutation rate, whereas chromatin accessibility showed the opposite trend. Histone modifications associated with active gene expression (such as H3K4me1, H3K27ac and H3H36me3) also showed lower mutation rates, whereas H3K9me1 and cytosine methylation were associated with high mutation rates. These correlations are consistent with works in mammals and yeast and suggest that the bias could be explained by the different target preference of the DNA mismatch repair machinery. In addition, the predictive model also has a similar gene-level distribution compared with polymorphisms in wild populations, with peaks in the transcription starting sites (TSS) and transcription termination sites (TTS). This suggests that the mutation bias observed in the natural population is a consequence of de novo mutation bias and not necessarily a product of selection.

Authors also analysed mutation bias in each gene feature (promoters, UTRs, exons, etc.). This is particularly interesting for coding regions which can have major impacts on fitness. They find that mutation frequency is correlated with functional constraints (synonymous vs non-synonymous mutations, gene expression level, etc.). Moreover, high mutation rates are anti-correlated with genes annotated with core biological function ontologies.


Future directions

The preprint questions many concepts generally accepted in the classic theories of evolution. It is also clear and concise regarding the problems that motivate the work and the answers that the authors provide with the existing data.

This preprint is a provocative piece with many important novel findings associated with features that are frequently passed over. Their findings will have a broad impact on the evolutionary biology community, not only plant biologist. The release of the preprint sparked a great and interesting discussion on social media between the authors and readers. In a very innovative initiative, the authors also open a Google docs file in order to receive feedback from the community. Many interesting questions remain open, particularly associated with genetic and epigenetic features that were not specifically covered in the regression model and downstream analysis, such as transposable and repetitive elements or nucleosome positioning. Also, there could be important differences between SNPs and INDELS that could be missed in the analysis when both events are combined in mutations as a whole (Lujan et al., 2014). Certainly, the preprint will open pathways to future works assessing the impact of mutation bias in evolutionary events. Recently, Boukas et al. (2020) have released another pre-print addressing similar questions in humans but focused on promoter region methylation and CpG islands.



Lujan, S. A., Clausen, A. R., Clark, A. B., MacAlpine, H. K., MacAlpine, D. M., Malc, E. P., Mieczkowski, P. A., Burkholder, A. B., Fargo, D. C., Gordenin, D. A., & Kunkel, T. A. (2014). Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome research, 24(11), 1751–1764.

Boukas, L, Bjornsson H. T., Hansen K. D. (2020). Purifying selection acts on germline methylation to modify the CpG mutation rate at promoters. bioRxiv 2020.07.04.187880.



Tags: arabidopsis, epigenetics, evolution

Posted on: 6th July 2020 , updated on: 14th July 2020


Read preprint (No Ratings Yet)

Author's response

Grey Monroe shared

(FR) How was the experience of open the pre-prints for comments in a Google Docs? How the feedback from the community helped you to delineate the future version of the work?
Thank you for the thoughtful write up and questions!

This has been an exciting project to work on and when deciding how to proceed with publication of our findings we felt it would be important to expand the scale of peer review given that some of the results might be viewed by some as unorthodox. The preprint “phase” of publication, which has become so popular in the life sciences, provides an opportunity to seek input from the community before a paper even makes its way to an editor and formal peer review. In addition to the standard mechanisms of feedback after posting a preprint such as direct contacts, Twitter discussions, and comments on biorxiv, we decided to explore another option to make community peer review even simpler (and possibly even anonymous) – an open Google Doc of the manuscript in “comment-only mode”, where anyone could provide in-line comments.

We were at first unsure how well this experimental approach to peer review would go. No doubt there were some concerns that an open Google Doc might attract “trolls” or anyone using the anonymity of the internet to act in bad faith – but we have seen nothing of the sort. The comments we have received have all been constructive and many were very helpful. In addition to providing a direct outlet to facilitate community feedback, by posting the manuscript for all to comment, taking this open approach seems to have served as a message to the community that we are eager to benefit from feedback and we have been contacted by a number of researchers directly with constructive comments.

This feedback will considerably improve the future version of this work. One simple but valuable contribution from several researchers was pointing us in the direction of relevant references that we had overlooked. With a vast and ever-growing literature, crowdsourcing literature review is incredibly powerful and allows for gaps to be filled toward a more complete picture of the work. For example, we had missed a remarkable paper from 2004 that found functional bias in mutation hot and cold spots in humans (Chuang and Li 2004). In particular, genes involved in RNA processing tended to be associated with mutational “cold spots” in the genome, which we also found in our study.

Another direction we will examine based on community feedback is exploring the difference between mutation rates of single nucleotide variants, insertions, and deletions. A cursory comparison has revealed that single nucleotide mutations are more likely to be predicted by cytosine methylation than insertions and deletions, which is consistent with cytosine deamination being an important source of single nucleotide mutations but not indels.

Finally, we will use ideas based on feedback from the community to better articulate and improve our attempts to control for false positive calls (e.g., sequencing errors). Because DNA sequencing and mapping are imperfect, striking the right balance between filtering out false positives and keeping real variants is key for a project like this. We are now exploring new analyses to test how robust the results are here to such filtering to ensure that they are not an artifact of bias in the distribution of false positive calls that made it through our original filtering steps. One new step in our pipeline we are exploring is explicitly removing variants detected in an unexpectedly high number of independent lines as these may be more likely to be erroneous.

Overall, reaching out to the community and asking for input has been an incredibly valuable and positive experience. Not only is a more thorough and open peer review process good for the scientific literature as a whole, but when faced with surprising results like we were here, it brings peace of mind to know that the work has been rigorously examined by more than just a handful of reviewers and colleagues. We are extremely grateful to everyone for their time to read and respond to our recent preprint. We feel lucky to be part of such a generous community.

Chuang, Jeffrey H., and Hao Li. “Functional bias and spatial organization of genes in mutational hot and cold regions in the human genome.” PLoS Biol 2.2 (2004): e29.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the evolutionary biology category:

Festival of Ecology! 14th to 18th December 2020 #BES2020

A list of preprints dealing with the talks/posters presented in the Festival of Ecology virtual conference held in December 2020


List by Baheerathan Murugavel

Planar Cell Polarity – PCP

This preList contains preprints about the latest findings on Planar Cell Polarity (PCP) in various model organisms at the molecular, cellular and tissue levels.


List by Ana Dorrego-Rivas

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka, Madhuja Samaddar, Miguel V. Almeida, Sejal Davla, Jennifer Ann Black, Dey Lab

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah

COVID-19 / SARS-CoV-2 preprints

List of important preprints dealing with the ongoing coronavirus outbreak. See for additional resources and timeline, and for full list of bioRxiv and medRxiv preprints on this topic


List by Dey Lab, Zhang-He Goh


SDB 78th Annual Meeting 2019

A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.


List by Alex Eve

Pattern formation during development

The aim of this preList is to integrate results about the mechanisms that govern patterning during development, from genes implicated in the processes to theoritical models of pattern formation in nature.


List by Alexa Sadier