RefPlantNLR: a comprehensive collection of experimentally validated plant NLRs

Jiorgos Kourelis, Sophien Kamoun

Preprint posted on 9 July 2020

RefPlantNLR: With all NLRs under one roof, appreciate the diversity and address the gaps

Selected by Hiral Shah

The Backstory

The nucleotide-binding leucine-rich repeat (NLR) is a large family of intracellular receptors involved in pathogen recognition, functioning as gatekeepers of plant immunity. The activation of NLRs by pathogen effectors, directly or indirectly, induces plant immune responses referred to as effector-triggered immunity (ETI) that prevents the proliferation of the pathogen. Though some “singleton” NLRs achieve pathogen sensing as well as immune signaling, many NLRs are either dedicated sensors or helpers in downstream signalling, functioning through gene clusters and networks.

With an impact on disease resistance, NLRs are a factor in crop breeding and believed to be involved in a co-evolution arms race with pathogen effectors through rapid variation in sequence and copy number, across species. Classically, plant NLRs are known to have a tripartite domain architecture with an N-terminal domain, a central NB-ARC domain (involved in nucleotide binding and oligomerization – NOD) and a C-terminal LRR domain. However, recent studies have uncovered plant NLRs with many different domains. For instance, the rice protein Pb1 has a NOD domain different from the canonical NB-ARC, but maintains overall NLR structure. The variable N terminal domain forms the basis of the classification of NLRs into four sub-clades, CC-NLR, TIR-NLR, CCR-NLR and the recent addition, the CCG10 NLR clade.

The study puts together an extensive reference dataset of experimentally validated 415 NLRs across 31 plant genera and 4 NLR clades by manually screening literature for genes associated with disease resistance or susceptibility, effector-triggeredimmune responses or their regulation and downstream signaling, necrosis and allelic series of NLRs, followed by annotation to ensure the presence of a NB-ARC domain along with additional domains. It is the first dataset of the OpenPlantNLR community. The study also provides a more compact set of 235 proteins after factoring in redundancies.


Key Findings

The dataset incorporates information about a wide range of aspects such as amino acid and coding sequences, plant source, pathogen, effectors, associated helper components and domain structure, uncovering 407 unique NLRs and 347 distinct NB-ARC domains. NLRs like RPP7 with identical sequences highlight the importance of context dependent regulation in different plant backgrounds.

The study describes the plant-wise distribution of validated NLRs drawing our attention to a skew towards the well-studied plants, with a substantial proportion of plant diversity not accounted for and clearly no members from non-flowering plants(Fig1A). The plant laboratory workhorse Arabidopsis, economically important cereals, rice, wheat and barley, and Solanaceae account for three-fourths of the NLRs in this set, a fraction that does not change much even in the 235 protein dataset. The fact that Arabidopsisis the only taxon with members from all NLR clades and the cereals show a bias towards CC-NLRs,highlights the current gaps, but also potential areas for the way forward in NLR biology.

Figure 1. A shows the number of experimentally validated NLRs per plant genus. (from Fig1A in preprint). B depicts the domain diversity of NLRs (from Fig 3C in preprint). Taken from Kourelis and Kamoun, 2020, provided under CC-BY 4.0.


The over-representation is also seen with respect to NLR clades. CC-NLR and TIR NLR are the most common domain combinations making up almost 80% of the validated NLRs. The remaining 20% forms a unique and interesting set, covering novel and non-canonical domain combinations, duplications and arrangements, at both the N and/or C-terminal domains (Fig 1B). For all those interested, the preprint has many interesting examples and details.The diversity is also seen in NLR protein lengths which vary between clades. NB-ARC domains show a tighter distribution barring a few extremely short and long one that stretched the boundaries of NLR domain diversity.

Domain gains are a recurrent feature of NLR evolution making prediction of plant NLR stricky. Though there are several NLR extractors that identify canonical NLR characteristics, this comprehensive dataset of functionally validated proteins with all its diversity could prove an important benchmarking resource for future NLR annotation tools.


Why I like it

The study provides a phylogenetic framework of experimentally validated NLRs, encouraging us to appreciate the structural diversity and directs the field towards the potential of under-studied plant groups and NLR clades. The authors are crowdsourcing for suggestions to improve the study with more comprehensive analysis which will be incorporated in the subsequent version to be submitted to the journal.

Tags: annotation, dataset, disease, plant, resistance, resource

Posted on: 21 July 2020


Read preprint (No Ratings Yet)

Author's response

Sophien Kamoun shared


1. What are the most interesting aspects of NLRs? Could you tell us about your decision to study them and obtaining community feedback on this preprint?

NLRs are extraordinarily diverse and tend to evolve rapidly. They are the most rapidly evolving plant genes and as such are worthy studying. They also complement our work on pathogen effectors and are useful when deployed in agriculture.

2. Is it possible to use the number of predicted NLRs for each plant group as a baseline for the data in Fig. 1? Are certain NLR clades better studied in certain plants or do they represent the overall distribution of NLRs in a particular plant family? For instance, do cereals show an abundance of CC-NLRs or are they the more frequently investigated clade in this plant group?

That’s an excellent suggestion and it’s worthy of a figure in the revision. We do wish first to benchmark current methods for NLR predictions. One outcome should be a more robust NLRome per species and at that stage it would be worth it to perform the analysis you propose.

3. How do you think benchmarking with this dataset would alter the predicted NLRome, expansion, diversification or fine tuning?Are there any new clues from the diverse architectures for engineering novel NLRs?

We don’t know yet. But anecdotal evidence indicates that NLR prediction software have limitations and biases. We need to understand this better using RefPlantNLR.

4. Is there information about the regulation and oligomerisation states of many NLRs?

This remains too limited at the time. We know this for only a few NLRs. But it would be great to add this information in the future.

5. The list shows viral, bacterial, fungal and oomycete pathogens. Do the pathogens also show an over representation of well-studied organisms?

Most certainly. And this is another excellent suggestion for the next version. I’d say this is worth of a Figure similar to Figure 1.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the evolutionary biology category:

EMBO | EMBL Symposium: The organism and its environment

This preList contains preprints discussed during the 'EMBO | EMBL Symposium: The organism and its environment', organised at EMBL Heidelberg, Germany (May 2023).


List by Girish Kale

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.


List by Alex Eve

Planar Cell Polarity – PCP

This preList contains preprints about the latest findings on Planar Cell Polarity (PCP) in various model organisms at the molecular, cellular and tissue levels.


List by Ana Dorrego-Rivas

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah

COVID-19 / SARS-CoV-2 preprints

List of important preprints dealing with the ongoing coronavirus outbreak. See for additional resources and timeline, and for full list of bioRxiv and medRxiv preprints on this topic


List by Dey Lab, Zhang-He Goh


SDB 78th Annual Meeting 2019

A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.


List by Alex Eve

Pattern formation during development

The aim of this preList is to integrate results about the mechanisms that govern patterning during development, from genes implicated in the processes to theoritical models of pattern formation in nature.


List by Alexa Sadier