DNA sequence and taxonomic gap analyses to quantify the coverage of aquatic cyanobacteria and eukaryotic microalgae in reference databases: Results of a survey in the Alpine region

Salmaso, Nico and Vasselon, Valentin and Rimet, Frédéric and Vautier, Marine and Elersek, Tina and Boscaini, Adriano and Donati, Claudio and Moretto, Marco and Pindo, Massimo and Riccioni, Giulia and Stefani, Erika and Capelli, Camilla and Lepori, Fabio and Kurmayer, Rainer and Mischke, Ute and Klemenčič, Aleksandra Krivograd and Novak, Katarina and Greco, Claudia and Franzini, Giorgio and Fusato, Giampaolo and Giacomazzi, Federica and Lea, Alessia and Menegon, Silvia and Zampieri, Chiara and Macor, Arianna and Virgilio, Damiano and Zanut, Elisa and Zorza, Raffaella and Buzzi, Fabio and Domaizon, Isabelle (2022) DNA sequence and taxonomic gap analyses to quantify the coverage of aquatic cyanobacteria and eukaryotic microalgae in reference databases: Results of a survey in the Alpine region. Science of The Total Environment, 834. p. 155175. ISSN 00489697

[img] Text
1-s2.0-S0048969722022689-main.pdf

Download (1MB)

Abstract

The taxonomic identification of organisms based on the amplification of specific genetic markers (metabarcoding) implicitly requires adequate discriminatory information and taxonomic coverage of environmental DNA sequences in taxonomic databases. These requirements were quantitatively examined by comparing the determination of cyanobacteria and microalgae obtained by metabarcoding and light microscopy. We used planktic and biofilm samples collected in 37 lakes and 22 rivers across the Alpine region. We focused on two of the most used and best represented genetic markers in the reference databases, namely the 16S rRNA and 18S rRNA genes. A sequence gap analysis using blastn showed that, in the identity range of 99–100%, approximately 30% (plankton) and 60% (biofilm) of the sequences did not find any close counterpart in the reference databases (NCBI GenBank). Similarly, a taxonomic gap analysis showed that approximately 50% of the cyanobacterial and eukaryotic microalgal species identified by light microscopy were not represented in the reference databases. In both cases, the magnitude of the gaps differed between the major taxonomic groups. Even considering the species determined under the microscope and represented in the reference databases, 22% and 26% were still not included in the results obtained by the blastn at percentage levels of identity ≥95% and ≥97%, respectively. The main causes were the absence of matching sequences due to amplification and/or sequencing failure and potential misidentification in the microscopy step. Our results quantitatively demonstrated that in metabarcoding the main obstacles in the classification of 16S rRNA and 18S rRNA sequences and interpretation of high-throughput sequencing biomonitoring data were due to the existence of important gaps in the taxonomic completeness of the reference databases and the short length of reads. The study focused on the Alpine region, but the extent of the gaps could be much greater in other less investigated geographic areas.

Actions (login required)

View Item View Item