Nanopore-based genome assembly and the evolutionary genomics of basmati rice

Jae Young Choi, Zoe N. Lye, Simon C. Groen, Xiaoguang Dai, Priyesh Rughani, Sophie Zaaijer, Eoghan D. Harrington, Sissel Juul, Michael D. Purugganan

Preprint posted on 13 August 2019

What makes basmati rice differ from other varieties? How did they evolve? The sequencing of two high-quality basmati rice genomes provides clues to their evolutionary history.

Selected by Edi Sudianto


Asian rice (Oryza sativa L.) is one of the most widely-consumed crops – feeding about half of the world’s population. Two subspecies of rice are formally recognized, short-grain subspecies japonica and long-grain subspecies indica (see review by [1]). There have contentious debates on how many domestication events shaped the current Asian rice varieties [2]. In addition to the two subspecies, other widely recognized varieties are aus and basmati (aromatic) rice. Basmati rice is unique as compared to other rice varieties as it is highly valued among the South Asian populations for its fragrant, long, and slender grains.

Here, the authors provide high-quality, chromosome-scale genome assemblies of two basmati rice landraces (Basmati 334 and Sufid) using long-read Nanopore sequencing platform. Unlike the more commonly used Illumina short-read sequences, Nanopore reads offer an opportunity to assemble a more contiguous genome. The two basmati rice genomes represent the untapped genetic information that was not readily available. With these genomes, the authors presented a comparative genomics study to disentangle the complex history of rice domestication and evolution.


Key findings

Expansion of copia-like retrotransposon in the basmati rice genomes

The two basmati rice genomes have had more repetitive DNA sequences than japonica rice. Among these repetitive DNA, retrotransposons constitute the highest proportion (~52%) in both genomes. In particular, the authors discovered that the two largest retrotransposon families, gypsy and copia, vary among four rice varieties (indica, japonica, aus, and basmati). Some retrotransposons are found to be specific to domesticated varieties, but could not be found in wild rice (single asterisk in Figure 1). In addition, several gypsy-like retrotransposons are specific to indica, aus, and basmati (double asterisk in Figure 1), while some copia-like repeats are only specific to basmati varieties (triple asterisk in Figure 1).

Figure 1. Phylogeny of two most abundant retrotransposon families, gypsy and copia, based on the rve gene among four rice cultivar types and two wild rice (Adapted from Figure 4C of the preprint).


Basmati rice has had extensive gene flow from aus rice

The origin of basmati rice variety has not yet been fully understood. Earlier studies have proposed that basmati rice is a hybrid between japonica and aus rice. In this preprint, the authors identified that the two basmati rice are closer to the japonica than indica or aus (Figure 2). However, further analyses also indicated that gene flows also play a role in shaping the evolution of rice varieties. Japonica rice variety is shown to have admixture events with O. rufipogon, while those of basmati-type rice has had gene exchanges with aus-type (Figure 2).

Figure 2. (Left) Maximum likelihood tree based on four-fold degenerate sites among the rice varieties. (Right) Model of gene flow events among domesticated Asian rice. cA, aus; cB, basmati; I, indica; J, japonica. (Adapted from Figure 5A and F of the preprint).


Population genomics point to three distinct genetic groups among basmati rice

With the availability of high-quality basmati rice genomes, the authors were able to perform population genomics study in this preprint to understand the diversity of this rice type. Basmati rice was shown to segregate into three distinct genetic groups based on their locality, including (1) Bhutan/Nepal, (2) India/Bangladesh/Myanmar, and (3) Iran/Pakistan groups (Figure 3). Group 2 (India/Bangladesh/Myanmar) is genetically more distinct than the other two groups, likely due to the continuous gene flows from aus varieties which are traditionally grown in these regions.

Figure 3. (Top) Principal component analysis (PCA) plot of the 78 basmati rice varieties based on the population genomic dataset. Dashed lines denote the genetic group segregation. (Bottom) The geographic locations of the basmati rice varieties. (Adapted from Figure 7A and C of the preprint).


Why I like this preprint

I chose this preprint as it provides a good example of Nanopore long-read application in generating the high-quality assembly of plant genomes. Plants are notorious for its complicated and repetitive-rich genomes. Long-read sequences, from either PacBio or Nanopore, have been anticipated to tackle these problems. In this preprint, the authors were able to reconstruct high-quality genome assemblies using Nanopore technology. These new genomes then can be used to address long-standing evolutionary questions, such as the origin and population genomics of basmati rice in this study.


Future directions and questions

  1. As mentioned in the conclusion, the two genomes provide additional genomic resources that can be used for further crop improvements. What kind of agronomic traits can we take from basmati rice?
  2. The two basmati rice genomes are highly syntenic to the Nipponbare genome, except for the pericentromeric region on chromosome 6. Are there any genes known to be located in this region? Does this inversion have any harmful effects on the basmati rice?



  1. Sweeney M and McCouch S. (2007). The complex history of the domesticated rice. Ann. Bot. 100: 951–957.
  2. Vaughan DA, Lu BR, Tomooka N. (2008). Was Asian rice (Oryza sativa) domesticated more than once? Rice 1: 16–24.

Tags: basmati, comparative genomics, evolution, gene flow, population genomics, rice

Posted on: 19 September 2019


Read preprint (No Ratings Yet)

Author's response

Jae Young Choi shared

Thanks for taking an interest in our preprint, and I really appreciate the great summary you’ve provided for our preprint. Just to add on your summary and give a bit more on what I thought was interesting from our results, I’m excited about the future possibility of using long read sequencing technologies (such as nanopore sequencing) to generate highly contiguous genomes for several individuals. This means that we will now have within population level variation based on de novo genome assemblies, and to me this is exciting because we will have access to the structural variations that were often difficult to access with previous short read sequencing (i.e. Illumina reads) data. Recent research by other groups are showing structural variations can involve complex rearrangements often involving repeat sequences, which are more difficult to sequence into with shorter reads. In our preprint we did some analysis on this and detected large and small structural variations (including the repeat sequence variation you mention in your summary) for our sequenced 2 individuals. I’m excited at the possibility of using long read sequencing technology to sequence populations to get a deeper understanding of the population wide variations based on long read data.


In response to your questions:


  1. The basmati rice (including the one we’ve sequenced Basmati 334) are known to be tolerant towards biotic and abiotic stresses. Some of the genes involved in the tolerance are known to be polymorphic (on the level of presence/absence variation and on the level of single nucleotide level variation), and the next step could be to use our genomic resource to find novel variations that confer tolerance towards stress. Basmati rice also have agronomic traits that are of interest for breeders as well. For instance several basmati rice are known to be more elongated than many other rice varities. My personal hope is that the genome reference generated from our study can act as a springboard for plant genomics and breeding researchers to study both single nucleotide and structural variations that shaped the evolution and domestication of the basmati rice.


  1. Its an interesting question. I haven’t had a chance to take a look at the exact genes that are within the inversion, but given that it involved the pericentromeric region there might not be many genes in the end. The inversion is pretty interesting because the indica rice also have it as well, while we think its a different inversion from the basmati rice. While its tempting to argue there may be a functional consequence (either it be deleterious or advantageous) its also possible that its a variation that fixed through genetic drift that was facilitated through the low recombination environment. On the other hand, because inversions can often suppress recombination it is interesting to imagine if the inversion was a result of selection preventing certain genetic variations within that region from admixing between different rice subpopulations. Naturally more study would be necessary to figure this out.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the evolutionary biology category:

EMBO | EMBL Symposium: The organism and its environment

This preList contains preprints discussed during the 'EMBO | EMBL Symposium: The organism and its environment', organised at EMBL Heidelberg, Germany (May 2023).


List by Girish Kale

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.


List by Alex Eve

Planar Cell Polarity – PCP

This preList contains preprints about the latest findings on Planar Cell Polarity (PCP) in various model organisms at the molecular, cellular and tissue levels.


List by Ana Dorrego-Rivas

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah

COVID-19 / SARS-CoV-2 preprints

List of important preprints dealing with the ongoing coronavirus outbreak. See for additional resources and timeline, and for full list of bioRxiv and medRxiv preprints on this topic


List by Dey Lab, Zhang-He Goh


SDB 78th Annual Meeting 2019

A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.


List by Alex Eve

Pattern formation during development

The aim of this preList is to integrate results about the mechanisms that govern patterning during development, from genes implicated in the processes to theoritical models of pattern formation in nature.


List by Alexa Sadier

Also in the genetics category:

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.


List by Sergio Menchero et al.

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

This preList contains preprints discussed during the 'Semmelweis Symposium 2022' (7-9 November), organised around the 40th anniversary of international medical education at Semmelweis University covering a wide range of topics.


List by Nándor Lipták

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

In this annual conference, Hungarian geneticists, biochemists and biotechnologists presented their works. Link:


List by Nándor Lipták

2nd Conference of the Visegrád Group Society for Developmental Biology

Preprints from the 2nd Conference of the Visegrád Group Society for Developmental Biology (2-5 September, 2021, Szeged, Hungary)


List by Nándor Lipták

EMBL Conference: From functional genomics to systems biology

Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020


List by Jesus Victorino

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah


Preprints on autophagy and lysosomal degradation and its role in neurodegeneration and disease. Includes molecular mechanisms, upstream signalling and regulation as well as studies on pharmaceutical interventions to upregulate the process.


List by Sandra Malmgren Hill

Zebrafish immunology

A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.


List by Shikha Nayar