Menu

Close

Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

Hila Sberro, Nicholas Greenfield, Georgios Pavlopoulos, Nikos Kyrpides, Ami S Bhatt

Preprint posted on December 13, 2018 https://www.biorxiv.org/content/early/2018/12/13/494179

The power of a computational magnifying lens – peering into diversity of proteins, large and small, encoded by microbes in our bodies reveals many novel small ORFs, their putative functions and hints on their evolution and diversity

Selected by Ganesh Kadamur

Context: The coding region of genomes, commonly called Open Reading Frames (ORFs), have long been defined using a seemingly arbitrary minimum length cutoff (typically 50aa). Challenging this paradigm, studies over the past decade have shown widespread translation of stable products that fall below this threshold in both prokaryotes and eukaryotes, with functions in quorum sensing, development and calcium signaling, just to name a few. These proteins are encoded not only in intergenic regions, but also within annotated ORFs and have been named small ORFs (sORF). However, these studies have mostly relied on investigating well studied model organisms and so, extensive, large scale studies are notably missing.

Methodology: In this preprint, Sberro et al. tackle the question using large publicly available datasets and computational approaches. They mine metagenomes isolated from >250 human subjects as part of the Human Microbiome Project (NIH HMP I-II), running them through to a battery of analyses and computational pipelines. Using MetaProdigal, a prediction tool optimized to look for sORFs, they identify >2.5 million sORFs. Combining sequence based and domain analysis, these were grouped into ~400,000 clusters. To benchmark their methods, they search this set for ~30 sORFs from model organisms that have been well characterized and surprisingly find that almost half of these are absent in the human microbiome. To prune this list of ~400k clusters and increase confidence in their results, the authors utilize RNAcode, a program that incorporates evolutionary and mutational signatures amongst homologs to narrow down to bona fide sORFs. The authors then proceed to functional prediction and categorization of these sORF families using information about taxonomic specialization, intracellular location prediction based on sequence analysis and comparison to other environmentally sampled metagenomes. Also, as prokaryotic genes are commonly found in operons where functionally related genes are commonly clustered together, the genomic neighbourhood of sORF families was also analyzed for functional annotation. Together, this computational tour de force analysis has unearthed many novel sORF families, indicated putative functions and generates a vast body of hypotheses that can now be experimentally tested.

Pipeline for identification and prediction of function of sORFs from human microbiome metagenomic data (taken from Sberro et al., bioRxiv, 2018)

Key Findings:

  • The human microbiome has >4000 sORF families of which almost 50% are not detected in species from other sequenced microbiomes (soil, water, mouse etc), thus highlighting the uniqueness of the microbiota that call our bodies home. In the process, the authors show that ~2400 families identified here are present in genomes included in the RefSeq database. However, more than 1000 such families had remained unannotated because of the arbitrary 50aa length cutoff.
  • Some sORF families are more conserved than others. About 20 families are present in >50 species, whereas ~3000 families are found only in 10 or less species, suggesting that rapid evolutionary mutation and specialization is widespread amongst sORFs.
  • A mere 4% of all identified sORF families possess annotated domains, underscoring the breadth of unexplored sequence and structure space amongst sORFs.
  • 13 novel families are highly conserved across microbiomes isolated from different human niches (gut, mouth, skin) and thus likely encode essential housekeeping proteins. Almost half of these families are ribosome associated proteins – homologs of these families are also present in non-human microbiome species, supporting the prediction of them playing a critical role across phyla.
  • No single protein family is present across all human niches sampled, implying niche-specific evolution of protein sequences and families. It is pertinent to note here that undersampling, of donor samples per niche, might bias this interpretation.
  • About a third of novel sORFs are predicted to generate transmembrane and/or secreted proteins. Analysis of their genomic context suggests roles in quorum sensing, toxin-antitoxin systems and inter-cellular communication
  • Clues from genomic neighbourhood identifies ~200 families as potential phage defense genes with roles in CRISPR response, and ~600 families that might mediate horizontal gene transfer events.

Why I like the work: This work extends previous findings that show the widespread yet unappreciated translation of small proteins, defined as <50aa in length. While earlier work was focused on a small number of well studied model organisms, this expands our knowledge of sORF families by orders of magnitude, uncovering uncharacterized protein domains (and thus folds) in the process. This work has generated a rich resource ripe for future exploration, with promise of discovery of new antibiotics, tools that could be developed to interrogate a plethora of cellular processes and possibly also innovative ways to design cell permeable proteins for drug delivery. The section where the authors clearly spell out potential pitfalls of the methods and conclusions of their work is also particularly commendable. Finally, this work is a great showcase of how combining different computational tools and pipelines can yield important insights into novel biology.

Future directions:

  • Mass spectrometry to validate expression of predicted sORFs at protein level. This could also be complemented by techniques such as ribosome profiling.
  • Re-analysis of the data to recognize co-occuring species, and presence of sORFs in both species with predicted quorum sensing roles in both – for example, a secreted sORF in one that acts as a signal transducer, and a signal receiving receptor, typically not a sORF, in the other species. This could enlighten on communication pathways employed by these species in specific niches to regulate inter-cellular crosstalk.
  • Genetic studies in a wide range of species to test function. Development of a high throughput platform to study some families could be especially useful, for example sORFs predicted to play roles in quorum sensing. This would be contingent on ability to culture the species outside the body, and development of tools for genetic manipulation.
  • Explore diversity and specialization of sORF families across individuals from different ethnic backgrounds, as research increasingly shows microbiomes widely vary based on diet, environmental factors etc. Such analyses could particularly help identify rapidly evolving sequences that are most important for local adaptation.

Tags: function prediction, microbiome, proteins

Posted on: 23rd January 2019 , updated on: 24th January 2019

Read preprint (No Ratings Yet)




  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here

    Also in the bioinformatics category:

    Accurate detection of m6A RNA modifications in native RNA sequences

    Huanle Liu, Oguzhan Begik, Morghan C Lucas, et al.



    Selected by Christian Bates

    1

    Slide-seq: A Scalable Technology for Measuring Genome-Wide Expression at High Spatial Resolution

    Samuel G Rodriques, Robert R Stickels, Aleksandrina Goeva, et al.

    AND

    High-density spatial transcriptomics arrays for in situ tissue profiling

    Sanja Vickovic, Goekcen Eraslan, Johanna Klughammer, et al.



    Selected by Carmen Adriaens

    Endogenous CRISPR arrays for scalable whole organism lineage tracing

    James Cotterell, James Sharpe



    Selected by Irepan Salvador-Martinez

    Lineage tracing on transcriptional landscapes links state to fate during differentiation

    Caleb Weinreb, Alejo E Rodriguez-Fraticelli, Fernando D Camargo, et al.



    Selected by Yen-Chung Chen

    1

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.



    Selected by Irepan Salvador-Martinez

    Atlas of Subcellular RNA Localization Revealed by APEX-seq

    Furqan M Fazal, Shuo Han, Pornchai Kaewsapsak, et al.

    AND

    Proximity RNA labeling by APEX-Seq Reveals the Organization of Translation Initiation Complexes and Repressive RNA Granules

    Alejandro Padron, Shintaro Iwasaki, Nicholas Ingolia



    Selected by Christian Bates

    Applications, Promises, and Pitfalls of Deep Learning for Fluorescence Image Reconstruction

    Chinmay Belthangady , Loic A. Royer



    Selected by Romain F. Laine

    The embryonic transcriptome of Arabidopsis thaliana

    Falko Hofmann, Michael A Schon, Michael D Nodine



    Selected by Chandra Shekhar Misra

    1

    The landscape of antigen-specific T cells in human cancers

    Bo Li, Longchao Liu, Jian Zhang, et al.



    Selected by Rob Hynds

    1

    Single-cell RNA sequencing reveals novel cell differentiation dynamics during human airway epithelium regeneration

    Sandra Ruiz Garcia, Marie Deprez, Kevin Lebrigand, et al.



    Selected by Rob Hynds

    1

    PUMILIO hyperactivity drives premature aging of Norad-deficient mice

    Florian Kopp, Mehmet Yalvac, Beibei Chen, et al.



    Selected by Carmen Adriaens

    Target-specific precision of CRISPR-mediated genome editing

    Anob M Chakrabarti, Tristan Henser-Brownhill, Josep Monserrat, et al.



    Selected by Rob Hynds

    1

    Precise tuning of gene expression output levels in mammalian cells

    Yale S. Michaels, Mike B Barnkob, Hector Barbosa, et al.



    Selected by Tim Fessenden

    1

    Template switching causes artificial junction formation and false identification of circular RNAs

    Chong Tang, Tian Yu, Yeming Xie, et al.



    Selected by Fabio Liberante

    An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics

    Ilias Angelidis, Lukas M Simon, Isis E Fernandez, et al.



    Selected by Rob Hynds

    1

    SWI/SNF remains localized to chromatin in the presence of SCHLAP1

    Jesse R Raab, Keriayn N Smith, Camarie C Spear, et al.



    Selected by Carmen Adriaens

    1

    Also in the microbiology category:

    Human DNA-PK activates a STING-independent DNA sensing pathway

    Katelyn Burleigh, Joanna H. Maltbaek, Stephanie Cambier, et al.



    Selected by Connor Rosen

    Evolution-guided design of super-restrictor antiviral proteins reveals a breadth-versus-specificity tradeoff

    Rossana S Colon-Thillet, Emily S Hsieh, Laura Graf, et al.



    Selected by Connor Rosen

    HIV-1 Gag specifically restricts PI(4,5)P2 and cholesterol mobility in living cells creating a nanodomain platform for virus assembly

    C. Favard, J. Chojnacki, P. Merida, et al.



    Selected by Amberley Stephens

    Synthetic pluripotent bacterial stem cells

    Sara Molinari, David L. Shis, James Chappell, et al.



    Selected by Lorenzo Lafranchi

    Short-range interactions govern cellular dynamics in microbial multi-genotype systems

    Alma Dal Co, Simon van Vliet, Daniel Johannes Kiviet, et al.

    AND

    Rapid microbial interaction network inference in microfluidic droplets

    Ryan H Hsu, Ryan L Clark, Jin Wei Tan, et al.



    Selected by Connor Rosen

    Bacteriophage resistance alters antibiotic mediated intestinal expansion of enterococci

    Anushila Chatterjee, Cydney N Johnson, Phat Luong, et al.



    Selected by Yasmin Lau

    Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

    Hila Sberro, Nicholas Greenfield, Georgios Pavlopoulos, et al.



    Selected by Ganesh Kadamur

    Applications, Promises, and Pitfalls of Deep Learning for Fluorescence Image Reconstruction

    Chinmay Belthangady , Loic A. Royer



    Selected by Romain F. Laine

    Disrupting Transcriptional Feedback Yields an Escape-Resistant Antiviral

    Sonali Chaturvedi, Marie Wolf, Noam Vardi, et al.



    Selected by Pavithran Ravindran

    1

    Development and validation of serological markers for detecting recent exposure to Plasmodium vivax infection

    Rhea Jessica Longley, Michael T White, Eizo Takashima, et al.

    AND

    Antimalarial drug mefloquine kills both trophozoite and cyst stages of Entamoeba

    Conall Sauvey, Gretchen Ehrenkaufer, Anjan Debnath, et al.



    Selected by Zhang-He Goh

    Structure of a cytochrome-based bacterial nanowire

    David J Filman, Stephen F Marino, Joy E Ward, et al.



    Selected by Amberley Stephens

    Biofilm/Persister/Stationary Phase Bacteria Cause More Severe Disease Than Log Phase Bacteria II Infection with Persister Forms of Staphylococcus aureus Causes a Chronic Persistent Skin Infection with More Severe Lesion that Takes Longer to Heal and is not Eradicated by the Current Recommended Treatment in Mice

    Rebecca Yee, Yuting Yuan, Cory Brayton, et al.



    Selected by Snehal Kadam

    Acquired interbacterial defense systems protect against interspecies antagonism in the human gut microbiome

    Benjamin D. Ross, Adrian J. Verster, Matthew C. Radey, et al.



    Selected by Connor Rosen

    The Toll pathway inhibits tissue growth and regulates cell fitness in an infection-dependent manner

    Federico Germani, Daniel Hain, Denise Sternlicht, et al.



    Selected by Rohan Khadilkar

    The microbial basis of impaired wound healing: differential roles for pathogens, "bystanders", and strain-level diversification in clinical outcomes

    Lindsay Kalan, Jacquelyn S Meisel, Michael A Loesche, et al.



    Selected by Snehal Kadam

    CRISPR/Cas9-mediated gene deletion of the ompA gene in an Enterobacter gut symbiont impairs biofilm formation and reduces gut colonization of Aedes aegypti mosquitoes

    Shivanand Hegde, Pornjarim Nilyanimit, Elena Kozlova, et al.



    Selected by Snehal Kadam

    Also in the systems biology category:

    Spreading of molecular mechanical perturbations on linear filaments

    Zsombor Balassy, Anne-Marie Lauzon, Lennart Hilbert



    Selected by Lars Hubatsch

    Lineage tracing on transcriptional landscapes links state to fate during differentiation

    Caleb Weinreb, Alejo E Rodriguez-Fraticelli, Fernando D Camargo, et al.



    Selected by Yen-Chung Chen

    1

    Short-range interactions govern cellular dynamics in microbial multi-genotype systems

    Alma Dal Co, Simon van Vliet, Daniel Johannes Kiviet, et al.

    AND

    Rapid microbial interaction network inference in microfluidic droplets

    Ryan H Hsu, Ryan L Clark, Jin Wei Tan, et al.



    Selected by Connor Rosen

    High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue-specificity

    Kaia Mattioli, Pieter-Jan Volders, Chiara Gerhardinger, et al.



    Selected by Clarice Hong

    Variability of bacterial behavior in the mammalian gut captured using a growth-linked single-cell synthetic gene oscillator

    David T Riglar, David L Richmond, Laurent Potvin-Trottier, et al.



    Selected by Meng Zhu

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.



    Selected by Irepan Salvador-Martinez

    Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

    Hila Sberro, Nicholas Greenfield, Georgios Pavlopoulos, et al.



    Selected by Ganesh Kadamur

    Symmetry breaking in the embryonic skin triggers a directional and sequential front of competence during plumage patterning

    Richard Bailleul, Carole Desmarquet-Trin Dinh, Magdalena Hidalgo, et al.



    Selected by Alexa Sadier

    RNase L reprograms translation by widespread mRNA turnover escaped by antiviral mRNAs

    James M Burke, Stephanie L Moon, Evan T Lester, et al.



    Selected by Connor Rosen

    Acquired interbacterial defense systems protect against interspecies antagonism in the human gut microbiome

    Benjamin D. Ross, Adrian J. Verster, Matthew C. Radey, et al.



    Selected by Connor Rosen

    DNA microscopy: Optics-free spatio-genetic imaging by a stand-alone chemical reaction

    Joshua A. Weinstein, Aviv Regev, Feng Zhang



    Selected by Theo Sanderson

    2

    The Toll pathway inhibits tissue growth and regulates cell fitness in an infection-dependent manner

    Federico Germani, Daniel Hain, Denise Sternlicht, et al.



    Selected by Rohan Khadilkar

    LCM-seq reveals unique transcriptional adaption mechanisms of resistant neurons in spinal muscular atrophy

    Susanne Nichterwitz, Helena Storvall, Jik Nijssen, et al.

    AND

    Axon-seq decodes the motor axon transcriptome and its modulation in response to ALS

    Jik Nijssen, Julio Cesar Aguila Benitez, Rein Hoogstraaten, et al.



    Selected by Yen-Chung Chen

    Memory sequencing reveals heritable single cell gene expression programs associated with distinct cellular behaviors

    Sydney M Shaffer, Benjamin L Emert, Ann E. Sizemore, et al.



    Selected by Leighton Daigh

    2

    Conserved phosphorylation hotspots in eukaryotic protein domain families

    Marta J Strumillo, Michaela Oplova, Cristina Vieitez, et al.



    Selected by Gautam Dey

    LADL: Light-activated dynamic looping for endogenous gene expression control

    Mayuri Rege, Ji Hun Kim, Jacqueline Valeri, et al.



    Selected by Ivan Candido-Ferreira
    Close