Nanopore-based genome assembly and the evolutionary genomics of basmati rice

Jae Young Choi, Zoe N. Lye, Simon C. Groen, Xiaoguang Dai, Priyesh Rughani, Sophie Zaaijer, Eoghan D. Harrington, Sissel Juul, Michael D. Purugganan

Preprint posted on August 13, 2019

What makes basmati rice differ from other varieties? How did they evolve? The sequencing of two high-quality basmati rice genomes provides clues to their evolutionary history.

Selected by Edi Sudianto


Asian rice (Oryza sativa L.) is one of the most widely-consumed crops – feeding about half of the world’s population. Two subspecies of rice are formally recognized, short-grain subspecies japonica and long-grain subspecies indica (see review by [1]). There have contentious debates on how many domestication events shaped the current Asian rice varieties [2]. In addition to the two subspecies, other widely recognized varieties are aus and basmati (aromatic) rice. Basmati rice is unique as compared to other rice varieties as it is highly valued among the South Asian populations for its fragrant, long, and slender grains.

Here, the authors provide high-quality, chromosome-scale genome assemblies of two basmati rice landraces (Basmati 334 and Sufid) using long-read Nanopore sequencing platform. Unlike the more commonly used Illumina short-read sequences, Nanopore reads offer an opportunity to assemble a more contiguous genome. The two basmati rice genomes represent the untapped genetic information that was not readily available. With these genomes, the authors presented a comparative genomics study to disentangle the complex history of rice domestication and evolution.


Key findings

Expansion of copia-like retrotransposon in the basmati rice genomes

The two basmati rice genomes have had more repetitive DNA sequences than japonica rice. Among these repetitive DNA, retrotransposons constitute the highest proportion (~52%) in both genomes. In particular, the authors discovered that the two largest retrotransposon families, gypsy and copia, vary among four rice varieties (indica, japonica, aus, and basmati). Some retrotransposons are found to be specific to domesticated varieties, but could not be found in wild rice (single asterisk in Figure 1). In addition, several gypsy-like retrotransposons are specific to indica, aus, and basmati (double asterisk in Figure 1), while some copia-like repeats are only specific to basmati varieties (triple asterisk in Figure 1).

Figure 1. Phylogeny of two most abundant retrotransposon families, gypsy and copia, based on the rve gene among four rice cultivar types and two wild rice (Adapted from Figure 4C of the preprint).


Basmati rice has had extensive gene flow from aus rice

The origin of basmati rice variety has not yet been fully understood. Earlier studies have proposed that basmati rice is a hybrid between japonica and aus rice. In this preprint, the authors identified that the two basmati rice are closer to the japonica than indica or aus (Figure 2). However, further analyses also indicated that gene flows also play a role in shaping the evolution of rice varieties. Japonica rice variety is shown to have admixture events with O. rufipogon, while those of basmati-type rice has had gene exchanges with aus-type (Figure 2).

Figure 2. (Left) Maximum likelihood tree based on four-fold degenerate sites among the rice varieties. (Right) Model of gene flow events among domesticated Asian rice. cA, aus; cB, basmati; I, indica; J, japonica. (Adapted from Figure 5A and F of the preprint).


Population genomics point to three distinct genetic groups among basmati rice

With the availability of high-quality basmati rice genomes, the authors were able to perform population genomics study in this preprint to understand the diversity of this rice type. Basmati rice was shown to segregate into three distinct genetic groups based on their locality, including (1) Bhutan/Nepal, (2) India/Bangladesh/Myanmar, and (3) Iran/Pakistan groups (Figure 3). Group 2 (India/Bangladesh/Myanmar) is genetically more distinct than the other two groups, likely due to the continuous gene flows from aus varieties which are traditionally grown in these regions.

Figure 3. (Top) Principal component analysis (PCA) plot of the 78 basmati rice varieties based on the population genomic dataset. Dashed lines denote the genetic group segregation. (Bottom) The geographic locations of the basmati rice varieties. (Adapted from Figure 7A and C of the preprint).


Why I like this preprint

I chose this preprint as it provides a good example of Nanopore long-read application in generating the high-quality assembly of plant genomes. Plants are notorious for its complicated and repetitive-rich genomes. Long-read sequences, from either PacBio or Nanopore, have been anticipated to tackle these problems. In this preprint, the authors were able to reconstruct high-quality genome assemblies using Nanopore technology. These new genomes then can be used to address long-standing evolutionary questions, such as the origin and population genomics of basmati rice in this study.


Future directions and questions

  1. As mentioned in the conclusion, the two genomes provide additional genomic resources that can be used for further crop improvements. What kind of agronomic traits can we take from basmati rice?
  2. The two basmati rice genomes are highly syntenic to the Nipponbare genome, except for the pericentromeric region on chromosome 6. Are there any genes known to be located in this region? Does this inversion have any harmful effects on the basmati rice?



  1. Sweeney M and McCouch S. (2007). The complex history of the domesticated rice. Ann. Bot. 100: 951–957.
  2. Vaughan DA, Lu BR, Tomooka N. (2008). Was Asian rice (Oryza sativa) domesticated more than once? Rice 1: 16–24.

Tags: basmati, comparative genomics, evolution, gene flow, population genomics, rice

Posted on: 19th September 2019


Read preprint (No Ratings Yet)

  • Author's response

    Jae Young Choi shared

    Thanks for taking an interest in our preprint, and I really appreciate the great summary you’ve provided for our preprint. Just to add on your summary and give a bit more on what I thought was interesting from our results, I’m excited about the future possibility of using long read sequencing technologies (such as nanopore sequencing) to generate highly contiguous genomes for several individuals. This means that we will now have within population level variation based on de novo genome assemblies, and to me this is exciting because we will have access to the structural variations that were often difficult to access with previous short read sequencing (i.e. Illumina reads) data. Recent research by other groups are showing structural variations can involve complex rearrangements often involving repeat sequences, which are more difficult to sequence into with shorter reads. In our preprint we did some analysis on this and detected large and small structural variations (including the repeat sequence variation you mention in your summary) for our sequenced 2 individuals. I’m excited at the possibility of using long read sequencing technology to sequence populations to get a deeper understanding of the population wide variations based on long read data.


    In response to your questions:


    1. The basmati rice (including the one we’ve sequenced Basmati 334) are known to be tolerant towards biotic and abiotic stresses. Some of the genes involved in the tolerance are known to be polymorphic (on the level of presence/absence variation and on the level of single nucleotide level variation), and the next step could be to use our genomic resource to find novel variations that confer tolerance towards stress. Basmati rice also have agronomic traits that are of interest for breeders as well. For instance several basmati rice are known to be more elongated than many other rice varities. My personal hope is that the genome reference generated from our study can act as a springboard for plant genomics and breeding researchers to study both single nucleotide and structural variations that shaped the evolution and domestication of the basmati rice.


    1. Its an interesting question. I haven’t had a chance to take a look at the exact genes that are within the inversion, but given that it involved the pericentromeric region there might not be many genes in the end. The inversion is pretty interesting because the indica rice also have it as well, while we think its a different inversion from the basmati rice. While its tempting to argue there may be a functional consequence (either it be deleterious or advantageous) its also possible that its a variation that fixed through genetic drift that was facilitated through the low recombination environment. On the other hand, because inversions can often suppress recombination it is interesting to imagine if the inversion was a result of selection preventing certain genetic variations within that region from admixing between different rice subpopulations. Naturally more study would be necessary to figure this out.

    Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here