2025
Lataretu, Marie; Krautwurst, Sebastian; Huska, Matthew R; Marquet, Mike; Viehweger, Adrian; Braun, Sascha D; Brandt, Christian; Hölzer, Martin
Targeted decontamination of sequencing data with CLEAN Journal Article
In: NAR Genomics and Bioinformatics, vol. 7, 2025.
Abstract | Links | BibTeX | Tags: assembly, metagenomics, nanopore, RNA / transcriptomics, software
@article{nokey_81,
title = {Targeted decontamination of sequencing data with CLEAN},
author = {Marie Lataretu and Sebastian Krautwurst and Matthew R Huska and Mike Marquet and Adrian Viehweger and Sascha D Braun and Christian Brandt and Martin Hölzer},
doi = {10.1093/nargab/lqaf105},
year = {2025},
date = {2025-07-04},
urldate = {2025-07-04},
journal = {NAR Genomics and Bioinformatics},
volume = {7},
abstract = {Many biological and medical questions are answered based on the analysis of sequence data. However, we can find contamination, artificial spike-ins, and overrepresented rRNA (ribosomal RNA) sequences in various read collections and assemblies. In particular, spike-ins used as controls, as those known from Illumina or Nanopore data, are often not considered as contaminants and also not appropriately removed during analyses. Additionally, removing human host DNA may be necessary for data protection and ethical considerations to ensure that individuals cannot be identified. We developed CLEAN, a pipeline to remove unwanted sequences from both long- and short-read sequencing techniques. While focusing on Illumina and Nanopore data with their technology-specific control sequences, the pipeline can also be used for host decontamination of metagenomic reads and assemblies, or the removal of rRNA from RNA-Seq data. The results are the purified sequences and sequences identified as contaminated with statistics summarized in a report. The output can be used directly in subsequent analyses, resulting in faster computations and improved results. Although decontamination seems mundane, many contaminants are routinely overlooked, cleaned by steps that are not fully reproducible or difficult to trace. CLEAN facilitates reproducible, platform-independent data analysis in genomics and transcriptomics and is freely available at https://github.com/rki-mf1/clean under a BSD3 license.},
keywords = {assembly, metagenomics, nanopore, RNA / transcriptomics, software},
pubstate = {published},
tppubtype = {article}
}
Jose, Jisna; Teutloff, Erik; Mayer, Teresa; Naseem, Simrat; Barth, Emanuel; Halitschke, Rayko; Marz, Manja; Agler, Matthew T.
Immunity and bacterial recruitment in plant leaves are parallel processes that together shape sensitivity to temperature stress Journal Article
In: bioRxiv, 2025.
Abstract | Links | BibTeX | Tags: bacteria, evolution, fungi, metagenomics
@article{nokey_76,
title = {Immunity and bacterial recruitment in plant leaves are parallel processes that together shape sensitivity to temperature stress},
author = {Jisna Jose and Erik Teutloff and Teresa Mayer and Simrat Naseem and Emanuel Barth and Rayko Halitschke and Manja Marz and Matthew T. Agler},
doi = {10.1101/2024.06.10.598336},
year = {2025},
date = {2025-04-25},
journal = {bioRxiv},
abstract = {Rising global temperatures necessitate developing resilient crops with better adaptability to changing climates. Under elevated temperatures, plant immunity is downregulated, increasing risk of foliar pathogen attack. Manipulating plant defense hormones is one way to mitigate this detrimental effect. However, it is unclear how plant immunity interacts with plant microbiome assembly and how temperature will thus affect overall plant health and stability. In this study, we compared two Arabidopsis thaliana genotypes that feature divergent strategies for recruitment of commensal bacteria from natural soil. NG2, an A. thaliana ecotype we collected from Jena, Germany, was grown in its native soil and compared to CLLF, a genotype that recruits higher bacterial loads and higher bacterial diversity but without any dysbiotic phenotype. CLLF hyperaccumulates salicylic acid (SA) and jasmonates, has constitutively upregulated innate defenses, and shows increased resistance to necrotrophic fungal and hemi-biotrophic bacterial pathogens, indicating that pathogen immunity and non-pathogen recruitment function in parallel. Some of its leaf bacteria can utlize SA as a carbon source, suggesting that immunity and recruitment may even be linked by chemical hormones. CLLF exhibits high tolerance to heat stress in comparison to the NG2, with SA-associated defense processes remaining active under heat. Synthetic community (SynCom) experiments revealed that when the taxonomic diversity of bacteria available to CLLF is artificially reduced, resilience to heat stress is compromised, leading to dysbiosis. However, this dysbiosis does not occur in CLLF with a full SynCom or in the NG2 with any SynCom. These findings suggest that the downregulation of defenses in response to heat may contribute to the avoidance of dysbiosis caused by certain leaf bacteria, while full bacteriome taxonomic diversity can help maintain balance.Competing Interest StatementThe authors have declared no competing interest.},
keywords = {bacteria, evolution, fungi, metagenomics},
pubstate = {published},
tppubtype = {article}
}
Ornelas-Eusebio, Erika; Vorimore, Fabien; Aaziz, Rachid; Mandola, Maria-Lucia; Rizzo, Francesca; Marchino, Monica; Nogarol, Chiara; Risco-Castillo, Veronica; Zanella, Gina; Schnee, Christiane; Sachse, Konrad; Laroucau, Karine
Trichosporon asahii: A Potential Growth Promoter for C. gallinacea? Implications for Chlamydial Infections and Cell Culture Journal Article
In: Microorganisms, vol. 13, no. 2, 2025.
Abstract | Links | BibTeX | Tags: bacteria, fungi, metagenomics
@article{nokey_75,
title = {\textit{Trichosporon asahii}: A Potential Growth Promoter for \textit{C. gallinacea}? Implications for Chlamydial Infections and Cell Culture},
author = {Erika Ornelas-Eusebio and Fabien Vorimore and Rachid Aaziz and Maria-Lucia Mandola and Francesca Rizzo and Monica Marchino and Chiara Nogarol and Veronica Risco-Castillo and Gina Zanella and Christiane Schnee and Konrad Sachse and Karine Laroucau},
doi = {10.3390/microorganisms13020288},
year = {2025},
date = {2025-01-27},
urldate = {2025-01-27},
journal = {Microorganisms},
volume = {13},
number = {2},
abstract = {The cultivation of Chlamydia gallinacea, a recently identified species, is challenging due to the lack of an optimized protocol. In this study, several infection protocols were tested, including different cell lines, incubation temperatures, centrifugation methods and culture media. However, none were successful in field samples. The only exception was a chance co-culture with Trichosporon asahii, a microorganism commonly found in the chicken gut. This suggests that current in vitro methods may not be optimized for this species and that host-associated microorganisms may influence the in vivo growth of C. gallinacea, which is typically found in the chicken gut. These findings raise new questions and highlight the need for further investigation of microbial interactions within the host, particularly to understand their role in the proliferation of chlamydial species.},
keywords = {bacteria, fungi, metagenomics},
pubstate = {published},
tppubtype = {article}
}
2024
zu Siederdissen, Christian Höner; Spangenberg, Jannes; Bisdorf, Kevin; Krautwurst, Sebastian; Srivastava, Akash; Marz, Manja; Taubert, Martin
Nanopore sequencing enables novel detection of deuterium incorporation in DNA Journal Article
In: Computational and Structural Biotechnology Journal, vol. 23, 2024.
Abstract | Links | BibTeX | Tags: bacteria, DNA / genomics, machine learning, metagenomics, nanopore, nucleic acid modifications
@article{nokey_74,
title = {Nanopore sequencing enables novel detection of deuterium incorporation in DNA},
author = {Christian {Höner zu Siederdissen} and Jannes Spangenberg and Kevin Bisdorf and Sebastian Krautwurst and Akash Srivastava and Manja Marz and Martin Taubert},
doi = {10.1016/j.csbj.2024.09.027},
year = {2024},
date = {2024-10-03},
urldate = {2024-10-03},
journal = {Computational and Structural Biotechnology Journal},
volume = {23},
abstract = {Identifying active microbes is crucial to understand their role in ecosystem functions. Metabolic labeling with heavy, non-radioactive isotopes, i.e., stable isotope probing (SIP), can track active microbes by detecting heavy isotope incorporation in biomolecules such as DNA. However, the detection of heavy isotope-labeled nucleotides directly during sequencing has, to date, not been achieved. In this study, Oxford nanopore sequencing was utilized to detect heavy isotopes incorporation in DNA molecules. Two isotopes widely used in SIP experiments were employed to label a bacterial isolate: deuterium (D, as D2O) and carbon-13 (13C, as glucose). We hypothesize that labeled DNA is distinguishable from unlabeled DNA by changes in the nanopore signal. To verify this distinction, we employed a Bayesian classifier trained on signal distributions of short oligonucleotides (k-mers) from labeled and unlabeled sequencing reads. Our results show a clear distinction between D-labeled and unlabeled reads, based on changes in median and median absolute deviation (MAD) of the nanopore signals for different k-mers. In contrast, 13C-labeled DNA cannot be distinguished from unlabeled DNA. For D, the model employed correctly predicted more than 85% of the reads. Even when metabolic labeling was conducted with only 30% D2O, 80% of the obtained reads were correctly classified with a 5% false discovery rate. Our work demonstrates the feasibility of direct detection of deuterium incorporation in DNA molecules during Oxford nanopore sequencing. This finding represents a first step in establishing the combined use of nanopore sequencing and SIP for tracking active organisms in microbial ecology.},
keywords = {bacteria, DNA / genomics, machine learning, metagenomics, nanopore, nucleic acid modifications},
pubstate = {published},
tppubtype = {article}
}
2023
Rangel-Pineros, Guillermo; Almeida, Alexandre; Beracochea, Martin; Sakharova, Ekaterina; Marz, Manja; Muñoz, Alejandro Reyes; Hölzer, Martin; Finn, Robert D.
VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models Journal Article
In: PLOS Comput Biol, vol. 19, iss. 8, pp. e1011422, 2023.
Abstract | Links | BibTeX | Tags: annotation, classification, metagenomics, phylogenetics, software, viruses
@article{nokey,
title = {VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models},
author = {Guillermo Rangel-Pineros and Alexandre Almeida and Martin Beracochea and Ekaterina Sakharova and Manja Marz and Alejandro Reyes Muñoz and Martin Hölzer and Robert D. Finn },
doi = {10.1371/journal.pcbi.1011422},
year = {2023},
date = {2023-08-28},
journal = {PLOS Comput Biol},
volume = {19},
issue = {8},
pages = {e1011422},
abstract = {The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.},
keywords = {annotation, classification, metagenomics, phylogenetics, software, viruses},
pubstate = {published},
tppubtype = {article}
}
2021
Chaudhari, Narendrakumar M.; Overholt, Will A.; Figueroa-Gonzalez, Perla Abigail; Taubert, Martin; Bornemann, Till L. V.; Probst, Alexander J.; Hölzer, Martin; Marz, Manja; Küsel, Kirsten
The economical lifestyle of CPR bacteria in groundwater allows little preference for environmental drivers Journal Article
In: Environ Microbiome, vol. 16, no. 1, pp. 24, 2021.
Abstract | Links | BibTeX | Tags: groundwater, metagenomics
@article{nokey,
title = {The economical lifestyle of CPR bacteria in groundwater allows little preference for environmental drivers},
author = {Narendrakumar M. Chaudhari and Will A. Overholt and Perla Abigail Figueroa-Gonzalez and Martin Taubert and Till L. V. Bornemann and Alexander J. Probst and Martin Hölzer and Manja Marz and Kirsten Küsel},
doi = {10.1186/s40793-021-00395-w},
year = {2021},
date = {2021-12-14},
urldate = {2021-12-14},
journal = {Environ Microbiome},
volume = {16},
number = {1},
pages = {24},
abstract = {Background: The highly diverse Cand. Patescibacteria are predicted to have minimal biosynthetic and metabolic pathways, which hinders understanding of how their populations differentiate in response to environmental drivers or host organisms. Their mechanisms employed to cope with oxidative stress are largely unknown. Here, we utilized genome-resolved metagenomics to investigate the adaptive genome repertoire of Patescibacteria in oxic and anoxic groundwaters, and to infer putative host ranges.
Results: Within six groundwater wells, Cand. Patescibacteria was the most dominant (up to 79%) super-phylum across 32 metagenomes sequenced from DNA retained on 0.2 and 0.1 µm filters after sequential filtration. Of the reconstructed 1275 metagenome-assembled genomes (MAGs), 291 high-quality MAGs were classified as Cand. Patescibacteria. Cand. Paceibacteria and Cand. Microgenomates were enriched exclusively in the 0.1 µm fractions, whereas candidate division ABY1 and Cand. Gracilibacteria were enriched in the 0.2 µm fractions. On average, Patescibacteria enriched in the smaller 0.1 µm filter fractions had 22% smaller genomes, 13.4% lower replication measures, higher proportion of rod-shape determining proteins, and of genomic features suggesting type IV pili mediated cell-cell attachments. Near-surface wells harbored Patescibacteria with higher replication rates than anoxic downstream wells characterized by longer water residence time. Except prevalence of superoxide dismutase genes in Patescibacteria MAGs enriched in oxic groundwaters (83%), no major metabolic or phylogenetic differences were observed. The most abundant Patescibacteria MAG in oxic groundwater encoded a nitrate transporter, nitrite reductase, and F-type ATPase, suggesting an alternative energy conservation mechanism. Patescibacteria consistently co-occurred with one another or with members of phyla Nanoarchaeota, Bacteroidota, Nitrospirota, and Omnitrophota. Among the MAGs enriched in 0.2 µm fractions,, only 8% Patescibacteria showed highly significant one-to-one correlation, mostly with Omnitrophota. Motility and transport related genes in certain Patescibacteria were highly similar to genes from other phyla (Omnitrophota, Proteobacteria and Nanoarchaeota).
Conclusion: Other than genes to cope with oxidative stress, we found little genomic evidence for niche adaptation of Patescibacteria to oxic or anoxic groundwaters. Given that we could detect specific host preference only for a few MAGs, we speculate that the majority of Patescibacteria is able to attach multiple hosts just long enough to loot or exchange supplies.},
keywords = {groundwater, metagenomics},
pubstate = {published},
tppubtype = {article}
}
Results: Within six groundwater wells, Cand. Patescibacteria was the most dominant (up to 79%) super-phylum across 32 metagenomes sequenced from DNA retained on 0.2 and 0.1 µm filters after sequential filtration. Of the reconstructed 1275 metagenome-assembled genomes (MAGs), 291 high-quality MAGs were classified as Cand. Patescibacteria. Cand. Paceibacteria and Cand. Microgenomates were enriched exclusively in the 0.1 µm fractions, whereas candidate division ABY1 and Cand. Gracilibacteria were enriched in the 0.2 µm fractions. On average, Patescibacteria enriched in the smaller 0.1 µm filter fractions had 22% smaller genomes, 13.4% lower replication measures, higher proportion of rod-shape determining proteins, and of genomic features suggesting type IV pili mediated cell-cell attachments. Near-surface wells harbored Patescibacteria with higher replication rates than anoxic downstream wells characterized by longer water residence time. Except prevalence of superoxide dismutase genes in Patescibacteria MAGs enriched in oxic groundwaters (83%), no major metabolic or phylogenetic differences were observed. The most abundant Patescibacteria MAG in oxic groundwater encoded a nitrate transporter, nitrite reductase, and F-type ATPase, suggesting an alternative energy conservation mechanism. Patescibacteria consistently co-occurred with one another or with members of phyla Nanoarchaeota, Bacteroidota, Nitrospirota, and Omnitrophota. Among the MAGs enriched in 0.2 µm fractions,, only 8% Patescibacteria showed highly significant one-to-one correlation, mostly with Omnitrophota. Motility and transport related genes in certain Patescibacteria were highly similar to genes from other phyla (Omnitrophota, Proteobacteria and Nanoarchaeota).
Conclusion: Other than genes to cope with oxidative stress, we found little genomic evidence for niche adaptation of Patescibacteria to oxic or anoxic groundwaters. Given that we could detect specific host preference only for a few MAGs, we speculate that the majority of Patescibacteria is able to attach multiple hosts just long enough to loot or exchange supplies.
Damme, Renaud Van; Hölzer, Martin; Viehweger, Adrian; Müller, Bettina; Bongcam-Rudloff, Erik; Brandt, Christian
Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN) Journal Article
In: PLOS Comput Biol, vol. 17, no. 2, pp. e1008716, 2021.
Abstract | Links | BibTeX | Tags: annotation, assembly, classification, DNA / genomics, metagenomics, RNA / transcriptomics, software
@article{VanDamme:21,
title = {Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN)},
author = {Renaud Van Damme and Martin Hölzer and Adrian Viehweger and Bettina Müller and Erik Bongcam-Rudloff and Christian Brandt},
editor = {Mihaela Pertea},
url = {https://github.com/RVanDamme/MUFFIN},
doi = {10.1371/journal.pcbi.1008716},
year = {2021},
date = {2021-02-09},
urldate = {2021-02-09},
journal = {PLOS Comput Biol},
volume = {17},
number = {2},
pages = {e1008716},
publisher = {Public Library of Science (PLoS)},
abstract = {Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used for quantification and annotation by providing RNA-Seq data (optionally). We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and, if provided de novo metatranscript annotations across the metagenomic sample and for each bin. MUFFIN is available on github under GNUv3 licence: https://github.com/RVanDamme/MUFFIN.},
keywords = {annotation, assembly, classification, DNA / genomics, metagenomics, RNA / transcriptomics, software},
pubstate = {published},
tppubtype = {article}
}
Pappas, Nikolaos; Roux, Simon; Hölzer, Martin; Lamkiewicz, Kevin; Mock, Florian; Marz, Manja; Dutilh, Bas E.
Virus Bioinformatics Book Section
In: Reference Module in Life Sciences, vol. 1, pp. 124-132, Elsevier, 2021, ISBN: 978-0-12-809633-8.
Abstract | Links | BibTeX | Tags: evolution, metagenomics, virus host interaction, viruses
@incollection{Pappas:20,
title = {Virus Bioinformatics},
author = {Nikolaos Pappas and Simon Roux and Martin Hölzer and Kevin Lamkiewicz and Florian Mock and Manja Marz and Bas E. Dutilh},
doi = {10.1016/B978-0-12-814515-9.00034-5},
isbn = {978-0-12-809633-8},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {Reference Module in Life Sciences},
volume = {1},
pages = {124-132},
publisher = {Elsevier},
abstract = {Since the discovery of computers, bioinformatics and computational biology have been instrumental in a wide range of discoveries in virology. These include early mathematical models of virus-host interaction, and more recently the analysis of viral nucleotide and protein sequences to track their function, epidemiology, and evolution. The genomics revolution has provided an unprecedented amount of sequence information from both viruses and their hosts. In this article, we discuss how bioinformatics allows viral sequence data to be analyzed and interpreted, including an overview of commonly used tools and examples of applications.
},
keywords = {evolution, metagenomics, virus host interaction, viruses},
pubstate = {published},
tppubtype = {incollection}
}
2020
Hufsky, Franziska; Beerenwinkel, Niko; Meyer, Irmtraud M.; Roux, Simon; Cook, Georgia May; Kinsella, Cormac M.; Lamkiewicz, Kevin; Marquet, Mike; Nieuwenhuijse, David F.; Olendraite, Ingrida; Paraskevopoulou, Sofia; Young, Francesca; Dijkman, Ronald; Ibrahim, Bashar; Kelly, Jenna; Mercier, Philippe Le; Marz, Manja; Ramette, Alban; Thiel, Volker
The International Virus Bioinformatics Meeting 2020 Journal Article
In: Viruses, vol. 12, no. 12, pp. 1398, 2020.
Abstract | Links | BibTeX | Tags: classification, conference report, evolution, metagenomics, software, viruses
@article{Hufsky:20b,
title = {The International Virus Bioinformatics Meeting 2020},
author = {Franziska Hufsky and Niko Beerenwinkel and Irmtraud M. Meyer and Simon Roux and Georgia May Cook and Cormac M. Kinsella and Kevin Lamkiewicz and Mike Marquet and David F. Nieuwenhuijse and Ingrida Olendraite and Sofia Paraskevopoulou and Francesca Young and Ronald Dijkman and Bashar Ibrahim and Jenna Kelly and Philippe Le Mercier and Manja Marz and Alban Ramette and Volker Thiel},
doi = {10.3390/v12121398},
year = {2020},
date = {2020-12-06},
urldate = {2020-01-01},
journal = {Viruses},
volume = {12},
number = {12},
pages = {1398},
publisher = {MDPI AG},
abstract = {The International Virus Bioinformatics Meeting 2020 was originally planned to take place in Bern, Switzerland, in March 2020. However, the COVID-19 pandemic put a spoke in the wheel of almost all conferences to be held in 2020. After moving the conference to 8–9 October 2020, we got hit by the second wave and finally decided at short notice to go fully online. On the other hand, the pandemic has made us even more aware of the importance of accelerating research in viral bioinformatics. Advances in bioinformatics have led to improved approaches to investigate viral infections and outbreaks. The International Virus Bioinformatics Meeting 2020 has attracted approximately 120 experts in virology and bioinformatics from all over the world to join the two-day virtual meeting. Despite concerns being raised that virtual meetings lack possibilities for face-to-face discussion, the participants from this small community created a highly interactive scientific environment, engaging in lively and inspiring discussions and suggesting new research directions and questions. The meeting featured five invited and twelve contributed talks, on the four main topics: (1) proteome and RNAome of RNA viruses, (2) viral metagenomics and ecology, (3) virus evolution and classification and (4) viral infections and immunology. Further, the meeting featured 20 oral poster presentations, all of which focused on specific areas of virus bioinformatics. This report summarizes the main research findings and highlights presented at the meeting.},
keywords = {classification, conference report, evolution, metagenomics, software, viruses},
pubstate = {published},
tppubtype = {article}
}
Kalvari, Ioanna; Nawrocki, Eric P; Ontiveros-Palacios, Nancy; Argasinska, Joanna; Lamkiewicz, Kevin; Marz, Manja; Griffiths-Jones, Sam; Toffano-Nioche, Claire; Gautheret, Daniel; Weinberg, Zasha; Rivas, Elena; Eddy, Sean R; Finn, Robert D; Bateman, Alex; Petrov, Anton I
Rfam 14: expanded coverage of metagenomic, viral and microRNA families Journal Article
In: Nucleic Acids Res, vol. 49, no. D1, pp. D192–D200, 2020.
Abstract | Links | BibTeX | Tags: alignment, annotation, bacteria, coronavirus, database, metagenomics, ncRNAs, RNA / transcriptomics, software, viruses
@article{Kalvari:21,
title = {Rfam 14: expanded coverage of metagenomic, viral and microRNA families},
author = {Ioanna Kalvari and Eric P Nawrocki and Nancy Ontiveros-Palacios and Joanna Argasinska and Kevin Lamkiewicz and Manja Marz and Sam Griffiths-Jones and Claire Toffano-Nioche and Daniel Gautheret and Zasha Weinberg and Elena Rivas and Sean R Eddy and Robert D Finn and Alex Bateman and Anton I Petrov},
url = {https://rfam.org/},
doi = {10.1093/nar/gkaa1047},
year = {2020},
date = {2020-11-19},
urldate = {2020-11-19},
journal = {Nucleic Acids Res},
volume = {49},
number = {D1},
pages = {D192--D200},
publisher = {Oxford University Press (OUP)},
abstract = {Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.},
keywords = {alignment, annotation, bacteria, coronavirus, database, metagenomics, ncRNAs, RNA / transcriptomics, software, viruses},
pubstate = {published},
tppubtype = {article}
}
Overholt, Will A.; Hölzer, Martin; Geesink, Patricia; Diezel, Celia; Marz, Manja; Küsel, Kirsten
Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome-assembled genomes from a complex aquifer system Journal Article
In: Environ Microbiol, vol. 22, no. 9, pp. 4000-4013, 2020.
Abstract | Links | BibTeX | Tags: assembly, DNA / genomics, groundwater, metagenomics, nanopore, viruses
@article{Overholt:20,
title = {Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome-assembled genomes from a complex aquifer system},
author = {Will A. Overholt and Martin Hölzer and Patricia Geesink and Celia Diezel and Manja Marz and Kirsten Küsel},
doi = {10.1111/1462-2920.15186},
year = {2020},
date = {2020-08-05},
urldate = {2020-08-05},
journal = {Environ Microbiol},
volume = {22},
number = {9},
pages = {4000-4013},
publisher = {Wiley},
abstract = {Assembling microbial and viral genomes from metagenomes is a powerful and appealing method to understand structure–function relationships in complex environments. To compare the recovery of genomes from microorganisms and their viruses from groundwater, we generated shotgun metagenomes with Illumina sequencing accompanied by long reads derived from the Oxford Nanopore Technologies (ONT) sequencing platform. Assembly and metagenome-assembled genome (MAG) metrics for both microbes and viruses were determined from an Illumina-only assembly, ONT-only assembly, and a hybrid assembly approach. The hybrid approach recovered 2× more mid to high-quality MAGs compared to the Illumina-only approach and 4× more than the ONT-only approach. A similar number of viral genomes were reconstructed using the hybrid and ONT methods, and both recovered nearly fourfold more viral genomes than the Illumina-only approach. While yielding fewer MAGs, the ONT-only approach generated MAGs with a high probability of containing rRNA genes, 3× higher than either of the other methods. Of the shared MAGs recovered from each method, the ONT-only approach generated the longest and least fragmented MAGs, while the hybrid approach yielded the most complete. This work provides quantitative data to inform a cost–benefit analysis of the decision to supplement shotgun metagenomic projects with long reads towards the goal of recovering genomes from environmentally abundant groups.},
keywords = {assembly, DNA / genomics, groundwater, metagenomics, nanopore, viruses},
pubstate = {published},
tppubtype = {article}
}
2019
Kallies, René; Hölzer, Martin; Toscan, Rodolfo Brizola; da Rocha, Ulisses Nunes; Anders, John; Marz, Manja; Chatzinotas, Antonis
Evaluation of Sequencing Library Preparation Protocols for Viral Metagenomic Analysis from Pristine Aquifer Groundwaters. Journal Article
In: Viruses, vol. 11, no. 6, pp. 484, 2019.
Abstract | Links | BibTeX | Tags: DNA / genomics, groundwater, metagenomics, viruses
@article{Kallies:19,
title = {Evaluation of Sequencing Library Preparation Protocols for Viral Metagenomic Analysis from Pristine Aquifer Groundwaters.},
author = {René Kallies and Martin Hölzer and Rodolfo Brizola Toscan and Ulisses Nunes da Rocha and John Anders and Manja Marz and Antonis Chatzinotas},
doi = {10.3390/v11060484},
year = {2019},
date = {2019-05-28},
urldate = {2019-01-01},
journal = {Viruses},
volume = {11},
number = {6},
pages = {484},
abstract = {Viral ecology of terrestrial habitats is yet-to be extensively explored, in particular the terrestrial subsurface. One problem in obtaining viral sequences from groundwater aquifer samples is the relatively low amount of virus particles. As a result, the amount of extracted DNA may not be sufficient for direct sequencing of such samples. Here we compared three DNA amplification methods to enrich viral DNA from three pristine limestone aquifer assemblages of the Hainich Critical Zone Exploratory to evaluate potential bias created by the different amplification methods as determined by viral metagenomics. Linker amplification shotgun libraries resulted in lowest redundancy among the sequencing reads and showed the highest diversity, while multiple displacement amplification produced the highest number of contigs with the longest average contig size, suggesting a combination of these two methods is suitable for the successful enrichment of viral DNA from pristine groundwater samples. In total, we identified 27,173, 5,886 and 32,613 viral contigs from the three samples from which 11.92 to 18.65% could be assigned to taxonomy using blast. Among these, members of the order were the most abundant group (52.20 to 69.12%) dominated by and . Those, and the high number of unknown viral sequences, substantially expand the known virosphere.},
keywords = {DNA / genomics, groundwater, metagenomics, viruses},
pubstate = {published},
tppubtype = {article}
}
Hufsky, Franziska; Ibrahim, Bashar; Modha, Sejal; Clokie, Martha R. J.; Deinhardt-Emmer, Stefanie; Dutilh, Bas E.; Lycett, Samantha; Simmonds, Peter; Thiel, Volker; Abroi, Aare; Adriaenssens, Evelien M.; Escalera-Zamudio, Marina; Kelly, Jenna Nicole; Lamkiewicz, Kevin; Lu, Lu; Susat, Julian; Sicheritz, Thomas; Robertson, David L.; Marz, Manja
The Third Annual Meeting of the European Virus Bioinformatics Center Journal Article
In: Viruses, vol. 11, no. 5, pp. 420, 2019.
Abstract | Links | BibTeX | Tags: classification, conference report, evolution, metagenomics, software, virus host interaction, viruses
@article{Hufsky:19,
title = {The Third Annual Meeting of the European Virus Bioinformatics Center},
author = {Franziska Hufsky and Bashar Ibrahim and Sejal Modha and Martha R. J. Clokie and Stefanie Deinhardt-Emmer and Bas E. Dutilh and Samantha Lycett and Peter Simmonds and Volker Thiel and Aare Abroi and Evelien M. Adriaenssens and Marina Escalera-Zamudio and Jenna Nicole Kelly and Kevin Lamkiewicz and Lu Lu and Julian Susat and Thomas Sicheritz and David L. Robertson and Manja Marz},
doi = {10.3390/v11050420},
year = {2019},
date = {2019-05-05},
urldate = {2019-05-05},
journal = {Viruses},
volume = {11},
number = {5},
pages = {420},
publisher = {MDPI AG},
abstract = {The Third Annual Meeting of the European Virus Bioinformatics Center (EVBC) took place in Glasgow, United Kingdom, 28–29 March 2019. Virus bioinformatics has become central to virology research, and advances in bioinformatics have led to improved approaches to investigate viral infections and outbreaks, being successfully used to detect, control, and treat infections of humans and animals. This active field of research has attracted approximately 110 experts in virology and bioinformatics/computational biology from Europe and other parts of the world to attend the two-day meeting in Glasgow to increase scientific exchange between laboratory- and computer-based researchers. The meeting was held at the McIntyre Building of the University of Glasgow; a perfect location, as it was originally built to be a place for “rubbing your brains with those of other people”, as Rector Stanley Baldwin described it. The goal of the meeting was to provide a meaningful and interactive scientific environment to promote discussion and collaboration and to inspire and suggest new research directions and questions. The meeting featured eight invited and twelve contributed talks, on the four main topics: (1) systems virology, (2) virus-host interactions and the virome, (3) virus classification and evolution and (4) epidemiology, surveillance and evolution. Further, the meeting featured 34 oral poster presentations, all of which focused on specific areas of virus bioinformatics. This report summarizes the main research findings and highlights presented at the meeting. },
keywords = {classification, conference report, evolution, metagenomics, software, virus host interaction, viruses},
pubstate = {published},
tppubtype = {article}
}
Wegner, Carl-Eric; Gaspar, Michael; Geesink, Patricia; Herrmann, Martina; Marz, Manja; Küsel, Kirsten
Biogeochemical regimes in shallow aquifers reflect the metabolic coupling of elements of nitrogen, sulfur and carbon. Journal Article
In: Appl Environ Microbiol, vol. 85, no. 5, pp. e02346-18, 2019.
Abstract | Links | BibTeX | Tags: bacteria, groundwater, metagenomics
@article{Wegner:19,
title = {Biogeochemical regimes in shallow aquifers reflect the metabolic coupling of elements of nitrogen, sulfur and carbon.},
author = {Carl-Eric Wegner and Michael Gaspar and Patricia Geesink and Martina Herrmann and Manja Marz and Kirsten Küsel},
doi = {10.1128/AEM.02346-18},
year = {2019},
date = {2019-02-20},
urldate = {2019-01-01},
journal = {Appl Environ Microbiol},
volume = {85},
number = {5},
pages = {e02346-18},
abstract = {Near-surface groundwaters are prone to receive (in)organic matter input from their recharge areas and are known to harbour autotrophic microbial communities linked to nitrogen and sulfur metabolism. Here, we use multi-"omic" profiling to gain holistic insights into the turnover of inorganic nitrogen compounds, carbon fixation processes and organic matter processing in groundwater. We sampled microbial biomass from two superimposed aquifers via monitoring wells that follow groundwater flow from its recharge area through differences in hydrogeochemical settings and land use. Functional profiling revealed that groundwater microbiomes are mainly driven by nitrogen (nitrification, denitrification, anammox) and to a lesser extent sulfur cycling (sulfur oxidation and sulfate reduction), dependent on local hydrochemical differences. Surprisingly, the differentiation potential of the groundwater microbiome surpasses that of hydrochemistry for individual monitoring wells. Dominated by few phyla (Bacteroidetes, Proteobacteria, Planctomycetes, Thaumarchaeota), the taxonomic profiling of groundwater metagenomes and metatranscriptomes revealed pronounced differences between merely present microbiome members and those actively participating in community gene expression and biogeochemical cycling. Unexpectedly, we observed a constitutive expression of carbohydrate-active enzymes, encoded by different microbiome members, along with the groundwater flow path. The turnover of organic carbon apparently complements for lithoautotrophic carbon assimilation pathways mainly used by the groundwater microbiome dependent on the availability of oxygen and inorganic electron donors like ammonium. Groundwater is a key resource for drinking water production and irrigation. The interplay between geological setting, hydrochemistry, carbon storage and groundwater microbiome ecosystem functioning is crucial for our understanding of these important ecosystem services. We targeted the encoded and expressed metabolic potential of groundwater microbiomes along an aquifer transect that diversifies in terms of hydrochemistry and land use. Our results showed that the groundwater microbiome has a higher spatial differentiation potential than hydrochemistry.},
keywords = {bacteria, groundwater, metagenomics},
pubstate = {published},
tppubtype = {article}
}
Viehweger, Adrian; Krautwurst, Sebastian; Koenig, Brigitte; Marz, Manja
An encoding of genome content for machine learning Journal Article
In: bioRxiv, pp. 524280, 2019.
Abstract | Links | BibTeX | Tags: assembly, machine learning, metagenomics
@article{Viehweger:19,
title = {An encoding of genome content for machine learning},
author = {Adrian Viehweger and Sebastian Krautwurst and Brigitte Koenig and Manja Marz},
url = {https://github.com/phiweger/nanotext},
doi = {10.1101/524280},
year = {2019},
date = {2019-01-18},
urldate = {2019-01-18},
journal = {bioRxiv},
pages = {524280},
publisher = {Cold Spring Harbor Laboratory},
abstract = {An ever-growing number of metagenomes can be used for biomining and the study of microbial functions. The use of learning algorithms in this context has been hindered, because they often need input in the form of low-dimensional, dense vectors of numbers. We propose such a representation for genomes called nanotext that scales to very large data sets.
The underlying model is learned from a corpus of nearly 150 thousand genomes spanning 750 million protein domains. We treat the protein domains in a genome like words in a document, assuming that protein domains in a similar context have similar “meaning”. This meaning can be distributed by a neural net over a vector of numbers.
The resulting vectors efficiently encode function, preserve known phylogeny, capture subtle functional relationships and are robust against genome incompleteness. The “functional” distance between two vectors complements nucleotide-based distance, so that genomes can be identified as similar even though their nucleotide identity is low. nanotext can thus encode (meta)genomes for direct use in downstream machine learning tasks. We show this by predicting plausible culture media for metagenome assembled genomes (MAGs) from the Tara Oceans Expedition using their genome content only. nanotext is freely released under a BSD licence (https://github.com/phiweger/nanotext).},
keywords = {assembly, machine learning, metagenomics},
pubstate = {published},
tppubtype = {article}
}
The underlying model is learned from a corpus of nearly 150 thousand genomes spanning 750 million protein domains. We treat the protein domains in a genome like words in a document, assuming that protein domains in a similar context have similar “meaning”. This meaning can be distributed by a neural net over a vector of numbers.
The resulting vectors efficiently encode function, preserve known phylogeny, capture subtle functional relationships and are robust against genome incompleteness. The “functional” distance between two vectors complements nucleotide-based distance, so that genomes can be identified as similar even though their nucleotide identity is low. nanotext can thus encode (meta)genomes for direct use in downstream machine learning tasks. We show this by predicting plausible culture media for metagenome assembled genomes (MAGs) from the Tara Oceans Expedition using their genome content only. nanotext is freely released under a BSD licence (https://github.com/phiweger/nanotext).
