Generations of children and biologists have marveled at the seemingly endless variations of colour and pattern on the back of ladybirds. As far as biologists are concerned, much of the attention has focused on the question how such large phenotypic variation is encoded in the genome. Classic genetic experiments in the 1930s have suggested that colour variation in ladybirds is encoded at a single locus, but the identity of that locus has remained enigmatic. The existence of a single colour pattern locus is puzzling, given that more than 200 colour patterns have been described, raising the question what kind of mechanism can support the stable existence of so many phenotypes in a single species. A new preprint by the labs of Arnaud Estoup and Benjamin Prud’homme now offers new insights and presents strong evidence that variation of the cis-regulatory region of a single gene encoding a transcription factor is responsible.
The authors started out by producing a new genome assembly of the harlequin ladybird Harmonia axyridis. For this they used a MinION sequencer, a device the size of an USB stick, capable of producing extremely long sequencing reads. They then used conventional short-read sequencing at high depth to assay the genomic variation present in 14 samples, each containing many ladybirds of various colours. Knowing the frequency of the different colour morphs in each sample allowed them to ask, which genetic variations are likely to be associated with the different patterns. This strongly suggested the importance of a single locus, in agreement with the genetic experiments done 80 years earlier.
The sequence identified by the genome-wide association study encodes two genes, GATAe and pannier, which both encode transcription factors. Neither of these genes has so far been implicated in animal colouration, so the authors performed RNAi in ladybirds to directly test for a role in this process. While GATAe had no effect on colour pattering, animals injected with dsRNA against pannier lacked all dark pigmentation. This strongly suggested that pannier is the long sought colour pattern gene in ladybirds. But how does genomic variation at this locus leads to all the different colour patterns?
To get at this question the authors first investigated the expression of pannier in ladybirds of different colour morphs. They found that there is a strong correlation between the level of expression and the amount of dark cuticle, with the dark areas expressing high levels of pannier and light areas expressing low levels. These differential expression patterns appear to be encoded in the non-coding sequences surrounding pannier, which are highly diverse in animals of the different colour morphs. Intriguingly, genomic variation at this locus includes at least one very large inversion, which is expected to suppress recombination, a possible mechanism that could contribute to the stable existence of multiple alleles.
Besides the iconic object of study, what I really love about this preprint is how the authors use modern genomics to shed light on a very old biological puzzle. The price and availability of sequencing has changed dramatically over the last 15 years, making large-scale sequencing projects feasible for individual labs. Furthermore, new sequencing technologies producing long sequencing reads make de novo genome assembly significantly easier and more accurate. Here, Gautier, Yamaguchi et al., are taking full advantage of these developments to re-address a long standing question that seemed intractable for many years. In the course of doing so they show how genes are repurposed during evolution and how extensive variation at a single locus can drive a large degree of phenotypic variation in animals.
This study takes full advantage of the first half of the genomic revolution: Our ever increasing ability to read genomes. What I anticipate to happen next is to harness the other half – our novel ability to rewrite the genetic code. Genome engineering using TALE nucleases has already been described in ladybirds. The CRISPR/Cas system, which in many respects is easier to handle, has been adopted in a large variety of species, and should hopefully also work in ladybirds. Genome engineering should allow precise modifications to the cis-regulatory region surrounding pannier and reveal, which sequences are responsible for its differential expression and hence the development of the different colour patterns. This should lead to a more detailed picture of how genomic variation at one single locus can give rise to the more than 200 variations of the red and black pattern we all love.