10:20–10:40 (online) .


Title: Mycobacterium tuberculosis genomics and Big Data

Authors: Christophe Sola1,2, Gaëtan Senelle3,4, Muhammed Rabiu Sahal1,2, Kevin La2,6, Typhaine Billard-Pomares7, Julie Marin2,8, Antoine Bridier-Nahmias2, Guislaine Refrégier1,5, Etienne Carbonnelle2,7, Emmanuelle Cambau2,6, Christophe Guyeux3,4

Affiliations: 1Université Paris‐Saclay, 91190, Gif-sur-Yvette, France; 2Université Paris-Cité, IAME, UMR 1137, INSERM, Paris, France; 3Université Bourgogne Franche-Comté (UBFC), Besançon, France; 4FEMTO-ST Institute, UMR 6174 CNRS-Université Bourgogne Franche-Comté (UBFC), France; 5Ecologie Systématique Evolution, Université Paris-Saclay, CNRS, AgroParisTech, UMR ESE, 91405, Orsay, France; 6AP-HP, GHU Nord site Bichat, Service de mycobactériologie spécialisée et de référence, Paris, France; 7Service de microbiologie clinique, Hôpital Avicenne, 93017 Bobigny, France; Université Paris 13, IAME, Inserm, 93017 Bobigny, France; 8Université Paris 13, IAME, UMR 1137, INSERM, Paris, France

Abstract: Mycobacterium tuberculosis complex (MTBC) has a population structure consisting of 9 human and animal lineages. The genomic diversity of clinical isolates within these lineages is a pathogenesis factor that affects virulence, transmissibility, host response and the emergence of antibiotic resistance. Hence it is important to develop improved systems for tracking and understanding the evolution of genomes across space and time. We present results of a new informatic platform for computational biology of MTBC, that uses a convenience sample from public/private SRAs, designated as „TB-Annotator”, describing the structure of the MTBC population based on 16,000 representative genomes (at the time of writing 80,000 genomes) from 63 countries in the forelast version (16,000 genomes). This platform analyzes nucleotidic variants, the presence/absence of genes, regions of difference, detects the insertion sites of mobile genetic elements. The objective of TB-Annotator is to detect recent epidemiological links but also to reconstruct more distant spatio-temporal phylogeographical stories between historically-related clones as well as to perform GWAS studies. We compare the taxonomic labels previously described in recent reference studies and build a phylogenetic tree with RAxML; we characterize about 200 sublineages whose different namings are compared and fused when possible; we discuss hierarchical typing schemes within lineages and the informativeness of certain SNPs, that we analyze in detail; we also present new phylogeographical clones, for example within L5,L6,L4.5. We show that this informatic platform allows: (1) local epidemiological monitoring, (2) improved understanding of the global and local history of tuberculosis as shown in recent examples, (3) in depth studies on the genetic selection mechanisms acting on MTBC genomes.