![]() ![]() LM benefits of a fellowship from Fundación Carolina. CV is recipient of contract FPU2018/02579, BB of contract FPU2016/02139 and CF of FPI contract BES-2015-074204 from MICIN (Spanish Government). WGS was performed at Servicio de Secuenciación Masiva y Bioinformática de la Fundación para la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO) and co-financed by the European Union through the Operational Program of European Regional Development Fund (ERDF) of Valencia Region (Spain) 2014-2020. Complete pipeline is available on Github ( ) to be run as a single script, so that the analyses conducted in this work could be easily reproduced on any dataset.įunding: This project was partly funded by projects BFU2017-89594R from MICIN (Spanish Government) and PROMETEO2016-0122 (Generalitat Valenciana, Spain). All relevant data are within the paper and its Supporting information files. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.ĭata Availability: The authors confirm that all data underlying the findings are fully available without restriction. Received: ApAccepted: JanuPublished: January 27, 2021Ĭopyright: © 2021 Valiente-Mullor et al. PLoS Comput Biol 17(1):Įditor: Kin Fai Au, Ohio State University, UNITED STATES (2021) One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads. Hence, the use of different reference genomes may be prescriptive to assess the potential biases of mapping.Ĭitation: Valiente-Mullor C, Beamud B, Ansari I, Francés-Cuesta C, García-González N, Mejía L, et al. This effect has proved to be pervasive in the five bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead to incorrect epidemiological inferences. Furthermore, the novelty of this work relies on a procedure that guarantees that we are evaluating only the effect of the reference. To our knowledge, this is the first work to systematically examine the effect of different references for mapping on the inference of tree topology as well as the impact on recombination and natural selection inferences. ![]() Eventually, these errors could lead to misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry between different bacterial lineages). It is known that genetic differences between the reference genome and the read sequences may produce incorrect alignments during mapping. However, the selection of an optimal reference is hindered by intrinsic intra-species genetic variability, particularly in bacteria. It is a common practice in genomic studies to use a single reference for mapping, usually the ‘reference genome’ of a species-a high-quality assembly. Mapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput genome sequencing to a previously assembled reference sequence. In any case, exploring the effects of different references on the final conclusions is highly recommended. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, d N/d S ratios, and phylogenetic trees were evaluated depending on the mapping reference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria ( Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |