RefPlantNLR: a comprehensive collection of experimentally validated plant NLRs

Jiorgos Kourelis, Sophien Kamoun

Preprint posted on July 09, 2020

RefPlantNLR: With all NLRs under one roof, appreciate the diversity and address the gaps

Selected by Hiral Shah

The Backstory

The nucleotide-binding leucine-rich repeat (NLR) is a large family of intracellular receptors involved in pathogen recognition, functioning as gatekeepers of plant immunity. The activation of NLRs by pathogen effectors, directly or indirectly, induces plant immune responses referred to as effector-triggered immunity (ETI) that prevents the proliferation of the pathogen. Though some “singleton” NLRs achieve pathogen sensing as well as immune signaling, many NLRs are either dedicated sensors or helpers in downstream signalling, functioning through gene clusters and networks.

With an impact on disease resistance, NLRs are a factor in crop breeding and believed to be involved in a co-evolution arms race with pathogen effectors through rapid variation in sequence and copy number, across species. Classically, plant NLRs are known to have a tripartite domain architecture with an N-terminal domain, a central NB-ARC domain (involved in nucleotide binding and oligomerization – NOD) and a C-terminal LRR domain. However, recent studies have uncovered plant NLRs with many different domains. For instance, the rice protein Pb1 has a NOD domain different from the canonical NB-ARC, but maintains overall NLR structure. The variable N terminal domain forms the basis of the classification of NLRs into four sub-clades, CC-NLR, TIR-NLR, CCR-NLR and the recent addition, the CCG10 NLR clade.

The study puts together an extensive reference dataset of experimentally validated 415 NLRs across 31 plant genera and 4 NLR clades by manually screening literature for genes associated with disease resistance or susceptibility, effector-triggeredimmune responses or their regulation and downstream signaling, necrosis and allelic series of NLRs, followed by annotation to ensure the presence of a NB-ARC domain along with additional domains. It is the first dataset of the OpenPlantNLR community. The study also provides a more compact set of 235 proteins after factoring in redundancies.


Key Findings

The dataset incorporates information about a wide range of aspects such as amino acid and coding sequences, plant source, pathogen, effectors, associated helper components and domain structure, uncovering 407 unique NLRs and 347 distinct NB-ARC domains. NLRs like RPP7 with identical sequences highlight the importance of context dependent regulation in different plant backgrounds.

The study describes the plant-wise distribution of validated NLRs drawing our attention to a skew towards the well-studied plants, with a substantial proportion of plant diversity not accounted for and clearly no members from non-flowering plants(Fig1A). The plant laboratory workhorse Arabidopsis, economically important cereals, rice, wheat and barley, and Solanaceae account for three-fourths of the NLRs in this set, a fraction that does not change much even in the 235 protein dataset. The fact that Arabidopsisis the only taxon with members from all NLR clades and the cereals show a bias towards CC-NLRs,highlights the current gaps, but also potential areas for the way forward in NLR biology.

Figure 1. A shows the number of experimentally validated NLRs per plant genus. (from Fig1A in preprint). B depicts the domain diversity of NLRs (from Fig 3C in preprint). Taken from Kourelis and Kamoun, 2020, provided under CC-BY 4.0.


The over-representation is also seen with respect to NLR clades. CC-NLR and TIR NLR are the most common domain combinations making up almost 80% of the validated NLRs. The remaining 20% forms a unique and interesting set, covering novel and non-canonical domain combinations, duplications and arrangements, at both the N and/or C-terminal domains (Fig 1B). For all those interested, the preprint has many interesting examples and details.The diversity is also seen in NLR protein lengths which vary between clades. NB-ARC domains show a tighter distribution barring a few extremely short and long one that stretched the boundaries of NLR domain diversity.

Domain gains are a recurrent feature of NLR evolution making prediction of plant NLR stricky. Though there are several NLR extractors that identify canonical NLR characteristics, this comprehensive dataset of functionally validated proteins with all its diversity could prove an important benchmarking resource for future NLR annotation tools.


Why I like it

The study provides a phylogenetic framework of experimentally validated NLRs, encouraging us to appreciate the structural diversity and directs the field towards the potential of under-studied plant groups and NLR clades. The authors are crowdsourcing for suggestions to improve the study with more comprehensive analysis which will be incorporated in the subsequent version to be submitted to the journal.

Posted on: 21st July 2020


Author's response

Sophien Kamoun shared


1. What are the most interesting aspects of NLRs? Could you tell us about your decision to study them and obtaining community feedback on this preprint?

NLRs are extraordinarily diverse and tend to evolve rapidly. They are the most rapidly evolving plant genes and as such are worthy studying. They also complement our work on pathogen effectors and are useful when deployed in agriculture.

2. Is it possible to use the number of predicted NLRs for each plant group as a baseline for the data in Fig. 1? Are certain NLR clades better studied in certain plants or do they represent the overall distribution of NLRs in a particular plant family? For instance, do cereals show an abundance of CC-NLRs or are they the more frequently investigated clade in this plant group?

That’s an excellent suggestion and it’s worthy of a figure in the revision. We do wish first to benchmark current methods for NLR predictions. One outcome should be a more robust NLRome per species and at that stage it would be worth it to perform the analysis you propose.

3. How do you think benchmarking with this dataset would alter the predicted NLRome, expansion, diversification or fine tuning?Are there any new clues from the diverse architectures for engineering novel NLRs?

We don’t know yet. But anecdotal evidence indicates that NLR prediction software have limitations and biases. We need to understand this better using RefPlantNLR.

4. Is there information about the regulation and oligomerisation states of many NLRs?

This remains too limited at the time. We know this for only a few NLRs. But it would be great to add this information in the future.

5. The list shows viral, bacterial, fungal and oomycete pathogens. Do the pathogens also show an over representation of well-studied organisms?

Most certainly. And this is another excellent suggestion for the next version. I’d say this is worth of a Figure similar to Figure 1.

