Nanopore-based genome assembly and the evolutionary genomics of basmati rice

Jae Young Choi, Zoe N. Lye, Simon C. Groen, Xiaoguang Dai, Priyesh Rughani, Sophie Zaaijer, Eoghan D. Harrington, Sissel Juul, Michael D. Purugganan

Posted on: 19 September 2019

Preprint posted on 13 August 2019

What makes basmati rice differ from other varieties? How did they evolve? The sequencing of two high-quality basmati rice genomes provides clues to their evolutionary history.

Selected by Edi Sudianto

Categories: evolutionary biology, genetics, genomics, plant biology

Background

Asian rice (Oryza sativa L.) is one of the most widely-consumed crops – feeding about half of the world’s population. Two subspecies of rice are formally recognized, short-grain subspecies japonica and long-grain subspecies indica (see review by [1]). There have contentious debates on how many domestication events shaped the current Asian rice varieties [2]. In addition to the two subspecies, other widely recognized varieties are aus and basmati (aromatic) rice. Basmati rice is unique as compared to other rice varieties as it is highly valued among the South Asian populations for its fragrant, long, and slender grains.

Here, the authors provide high-quality, chromosome-scale genome assemblies of two basmati rice landraces (Basmati 334 and Sufid) using long-read Nanopore sequencing platform. Unlike the more commonly used Illumina short-read sequences, Nanopore reads offer an opportunity to assemble a more contiguous genome. The two basmati rice genomes represent the untapped genetic information that was not readily available. With these genomes, the authors presented a comparative genomics study to disentangle the complex history of rice domestication and evolution.

Key findings

Expansion of copia-like retrotransposon in the basmati rice genomes

The two basmati rice genomes have had more repetitive DNA sequences than japonica rice. Among these repetitive DNA, retrotransposons constitute the highest proportion (~52%) in both genomes. In particular, the authors discovered that the two largest retrotransposon families, gypsy and copia, vary among four rice varieties (indica, japonica, aus, and basmati). Some retrotransposons are found to be specific to domesticated varieties, but could not be found in wild rice (single asterisk in Figure 1). In addition, several gypsy-like retrotransposons are specific to indica, aus, and basmati (double asterisk in Figure 1), while some copia-like repeats are only specific to basmati varieties (triple asterisk in Figure 1).

Figure 1. Phylogeny of two most abundant retrotransposon families, gypsy and copia, based on the rve gene among four rice cultivar types and two wild rice (Adapted from Figure 4C of the preprint).

Basmati rice has had extensive gene flow from aus rice

The origin of basmati rice variety has not yet been fully understood. Earlier studies have proposed that basmati rice is a hybrid between japonica and aus rice. In this preprint, the authors identified that the two basmati rice are closer to the japonica than indica or aus (Figure 2). However, further analyses also indicated that gene flows also play a role in shaping the evolution of rice varieties. Japonica rice variety is shown to have admixture events with O. rufipogon, while those of basmati-type rice has had gene exchanges with aus-type (Figure 2).

Figure 2. (Left) Maximum likelihood tree based on four-fold degenerate sites among the rice varieties. (Right) Model of gene flow events among domesticated Asian rice. cA, aus; cB, basmati; I, indica; J, japonica. (Adapted from Figure 5A and F of the preprint).

Population genomics point to three distinct genetic groups among basmati rice

With the availability of high-quality basmati rice genomes, the authors were able to perform population genomics study in this preprint to understand the diversity of this rice type. Basmati rice was shown to segregate into three distinct genetic groups based on their locality, including (1) Bhutan/Nepal, (2) India/Bangladesh/Myanmar, and (3) Iran/Pakistan groups (Figure 3). Group 2 (India/Bangladesh/Myanmar) is genetically more distinct than the other two groups, likely due to the continuous gene flows from aus varieties which are traditionally grown in these regions.

Figure 3. (Top) Principal component analysis (PCA) plot of the 78 basmati rice varieties based on the population genomic dataset. Dashed lines denote the genetic group segregation. (Bottom) The geographic locations of the basmati rice varieties. (Adapted from Figure 7A and C of the preprint).

Why I like this preprint

I chose this preprint as it provides a good example of Nanopore long-read application in generating the high-quality assembly of plant genomes. Plants are notorious for its complicated and repetitive-rich genomes. Long-read sequences, from either PacBio or Nanopore, have been anticipated to tackle these problems. In this preprint, the authors were able to reconstruct high-quality genome assemblies using Nanopore technology. These new genomes then can be used to address long-standing evolutionary questions, such as the origin and population genomics of basmati rice in this study.

Future directions and questions

As mentioned in the conclusion, the two genomes provide additional genomic resources that can be used for further crop improvements. What kind of agronomic traits can we take from basmati rice?
The two basmati rice genomes are highly syntenic to the Nipponbare genome, except for the pericentromeric region on chromosome 6. Are there any genes known to be located in this region? Does this inversion have any harmful effects on the basmati rice?

References

Sweeney M and McCouch S. (2007). The complex history of the domesticated rice. Ann. Bot. 100: 951–957.
Vaughan DA, Lu BR, Tomooka N. (2008). Was Asian rice (Oryza sativa) domesticated more than once? Rice 1: 16–24.

Tags: basmati, comparative genomics, evolution, gene flow, population genomics, rice

doi: https://doi.org/10.1242/prelights.13881

Read preprint

(No Ratings Yet)

Author's response

Jae Young Choi shared

Thanks for taking an interest in our preprint, and I really appreciate the great summary you’ve provided for our preprint. Just to add on your summary and give a bit more on what I thought was interesting from our results, I’m excited about the future possibility of using long read sequencing technologies (such as nanopore sequencing) to generate highly contiguous genomes for several individuals. This means that we will now have within population level variation based on de novo genome assemblies, and to me this is exciting because we will have access to the structural variations that were often difficult to access with previous short read sequencing (i.e. Illumina reads) data. Recent research by other groups are showing structural variations can involve complex rearrangements often involving repeat sequences, which are more difficult to sequence into with shorter reads. In our preprint we did some analysis on this and detected large and small structural variations (including the repeat sequence variation you mention in your summary) for our sequenced 2 individuals. I’m excited at the possibility of using long read sequencing technology to sequence populations to get a deeper understanding of the population wide variations based on long read data.

In response to your questions:

The basmati rice (including the one we’ve sequenced Basmati 334) are known to be tolerant towards biotic and abiotic stresses. Some of the genes involved in the tolerance are known to be polymorphic (on the level of presence/absence variation and on the level of single nucleotide level variation), and the next step could be to use our genomic resource to find novel variations that confer tolerance towards stress. Basmati rice also have agronomic traits that are of interest for breeders as well. For instance several basmati rice are known to be more elongated than many other rice varities. My personal hope is that the genome reference generated from our study can act as a springboard for plant genomics and breeding researchers to study both single nucleotide and structural variations that shaped the evolution and domestication of the basmati rice.

Its an interesting question. I haven’t had a chance to take a look at the exact genes that are within the inversion, but given that it involved the pericentromeric region there might not be many genes in the end. The inversion is pretty interesting because the indica rice also have it as well, while we think its a different inversion from the basmati rice. While its tempting to argue there may be a functional consequence (either it be deleterious or advantageous) its also possible that its a variation that fixed through genetic drift that was facilitated through the low recombination environment. On the other hand, because inversions can often suppress recombination it is interesting to imagine if the inversion was a result of selection preventing certain genetic variations within that region from admixing between different rice subpopulations. Naturally more study would be necessary to figure this out.

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the evolutionary biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Morphological variations in external genitalia do not explain the interspecific reproductive isolation in Nasonia species complex (Hymenoptera: Pteromalidae)

Babita Rahul Baisla, Taruna Verma, Anjali Rana, et al.

Selected by 23 January 2026

Stefan Friedrich Wirth

Discussion

A high-coverage genome from a 200,000-year-old Denisovan

Stéphane Peyrégne, Diyendo Massilani, Yaniv Swiel, et al.

AND

A global map for introgressed structural variation and selection in humans

PingHsun Hsieh, Natthapon Soisangwan, David S. Gordon, et al.

Selected by 02 December 2025

Siddharth Singh

Discussion

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Selected by 04 February 2026

Roberto Amadio et al.

Discussion

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Mohamad Youness, Onne A.H.O. Ronda, Ankit Pradhan, et al.

Selected by 15 December 2025

Jeny Jose

Discussion

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

Blake Dirks, Alex E. Mohr, Karen D. Corbin, et al.

Selected by 11 December 2025

Jasmine Talevi

Discussion

A high-coverage genome from a 200,000-year-old Denisovan

Stéphane Peyrégne, Diyendo Massilani, Yaniv Swiel, et al.

AND

A global map for introgressed structural variation and selection in humans

PingHsun Hsieh, Natthapon Soisangwan, David S. Gordon, et al.

Selected by 02 December 2025

Siddharth Singh

Discussion

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Also in the plant biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Actin Counters Geometry to Guide Plant Cell Division

Camila Goldy, Samantha Moulin, Yutaro Shimizu, et al.

Selected by 26 November 2025

Jeny Jose

Discussion

The nucleus follows an internal cellular scale during polarized root hair cell development

Jessica M. Orr, M. Arif Ashraf

Selected by 04 September 2025

Jeny Jose

Discussion

preLists in the evolutionary biology category:

SciELO preprints – From 2025 onwards

SciELO has become a cornerstone of open, multilingual scholarly communication across Latin America. Its preprint server, SciELO preprints, is expanding the global reach of preprinted research from the region (for more information, see our interview with Carolina Tanigushi). This preList brings together biological, English language SciELO preprints to help readers discover emerging work from the Global South. By highlighting these preprints in one place, we aim to support visibility, encourage early feedback, and showcase the vibrant research communities contributing to SciELO’s open science ecosystem.

Nanopore-based genome assembly and the evolutionary genomics of basmati rice

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the evolutionary biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Morphological variations in external genitalia do not explain the interspecific reproductive isolation in Nasonia species complex (Hymenoptera: Pteromalidae)

A high-coverage genome from a 200,000-year-old Denisovan

A global map for introgressed structural variation and selection in humans

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Kosmos: An AI Scientist for Autonomous Discovery

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

A high-coverage genome from a 200,000-year-old Denisovan

A global map for introgressed structural variation and selection in humans

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Also in the plant biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Actin Counters Geometry to Guide Plant Cell Division

The nucleus follows an internal cellular scale during polarized root hair cell development

preLists in the evolutionary biology category:

SciELO preprints – From 2025 onwards

November in preprints – DevBio & Stem cell biology

October in preprints – DevBio & Stem cell biology

October in preprints – Cell biology edition

Biologists @ 100 conference preList

‘In preprints’ from Development 2022-2023

preLights peer support – preprints of interest

EMBO | EMBL Symposium: The organism and its environment

9th International Symposium on the Biology of Vertebrate Sex Determination

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

Planar Cell Polarity – PCP

TAGC 2020

ECFG15 – Fungal biology

COVID-19 / SARS-CoV-2 preprints

SDB 78th Annual Meeting 2019

Pattern formation during development

Also in the genetics category:

SciELO preprints – From 2025 onwards

October in preprints – DevBio & Stem cell biology

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

Early 2025 preprints – the genetics & genomics edition

January in preprints – the CellBio edition

End-of-year preprints – the genetics & genomics edition

BSDB/GenSoc Spring Meeting 2024

BSCB-Biochemical Society 2024 Cell Migration meeting

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

2nd Conference of the Visegrád Group Society for Developmental Biology

EMBL Conference: From functional genomics to systems biology

TAGC 2020

ECFG15 – Fungal biology

Autophagy

Zebrafish immunology

Also in the genomics category:

November in preprints – DevBio & Stem cell biology

May in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

Early 2025 preprints – the genetics & genomics edition

End-of-year preprints – the genetics & genomics edition

BSCB-Biochemical Society 2024 Cell Migration meeting

9th International Symposium on the Biology of Vertebrate Sex Determination

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

EMBL Conference: From functional genomics to systems biology

TAGC 2020

Also in the plant biology category:

SciELO preprints – From 2025 onwards