2025
Thomas, Christine; Brangsch, Hanka; Galeone, Valentina; Hölzer, Martin; Marz, Manja; Linde, Jörg
Accurately assembling nanopore sequencing data of highly pathogenic bacteria Journal Article
In: BMC Genomics, vol. 26, 2025.
Abstract | Links | BibTeX | Tags: assembly, bacteria, DNA / genomics, nanopore
@article{nokey_82,
title = {Accurately assembling nanopore sequencing data of highly pathogenic bacteria},
author = {Christine Thomas and Hanka Brangsch and Valentina Galeone and Martin Hölzer and Manja Marz and Jörg Linde},
doi = {10.1186/s12864-025-11793-6},
year = {2025},
date = {2025-08-28},
journal = {BMC Genomics},
volume = {26},
abstract = {Background: Bacterial genome exploration and outbreak analysis rely heavily on robust whole-genome sequencing and bioinformatics analysis. Widely-used genomic methods, such as genotyping and detection of genetic markers demand high sequencing accuracy and precise genome assembly for reliable results.
Methods: To assess the utility of nanopore sequencing for genotyping highly pathogenic bacteria with low mutation rates, we sequenced six reference strains using Oxford Nanopore Technologies (ONT) R10.4.1 chemistry and Illumina and evaluated different assembly strategies. The publicly available RefSeq assemblies were chosen as the ground truth. Publicly available sequencing data from key foodborne and public-health-related bacterial pathogens were examined to provide a broader context for the analysis.
Results: While for Bacillus (Ba.) anthracis an almost perfect assembly was achieved, results varied for other species. For Brucella (Br.) spp., the final assemblies comprised five to 46 different nucleotides in comparison to Sanger-sequenced references. For some key foodborne and public-health-related bacterial pathogens (Klebsiella (K.) variicola, Listeria spp., Mycobacterium (M.) tuberculosis, Staphylococcus (Sta.) aureus, and Streptococcus (Str.) pyogenes) perfect genomes were obtained. Enhanced basecalling models have generally improved assembly accuracy, however, for certain species such as Br. abortus, older models have produced higher accuracy. While long-read polishing mainly improves assembly quality with only one round needed, our results indicate that this process may also degrade assembly quality. Overall, 81% of the observed errors in ONT assemblies were located within coding sequences (CDS). Furthermore, we found that methylation caused 6.5% of the errors, and the bacterial methylation-aware medaka polishing model reduced the number of errors linked to methylation. Core-genome Multilocus Sequence Typing (cgMLST) analysis revealed allele differences in Ba. anthracis, Br. abortus, and Francisella (F.) tularensis for some assemblers, although with fewer than five allele differences. In the case of Br. melitensis, some assemblies included five allele differences, whereas for Br. suis the correct cgMLST alleles were observed.
Conclusions: Assembling nanopore data from pathogenic bacteria vary in quality across different species and methods. However, errors persist in the final assemblies, including within cgMLST loci, influencing the reliability of outbreak predictions. Nevertheless, specific combinations of existing tools can generate perfect genome assemblies from bacterial ONT sequencing data for outbreak analysis without short-read polishing.},
keywords = {assembly, bacteria, DNA / genomics, nanopore},
pubstate = {published},
tppubtype = {article}
}
Methods: To assess the utility of nanopore sequencing for genotyping highly pathogenic bacteria with low mutation rates, we sequenced six reference strains using Oxford Nanopore Technologies (ONT) R10.4.1 chemistry and Illumina and evaluated different assembly strategies. The publicly available RefSeq assemblies were chosen as the ground truth. Publicly available sequencing data from key foodborne and public-health-related bacterial pathogens were examined to provide a broader context for the analysis.
Results: While for Bacillus (Ba.) anthracis an almost perfect assembly was achieved, results varied for other species. For Brucella (Br.) spp., the final assemblies comprised five to 46 different nucleotides in comparison to Sanger-sequenced references. For some key foodborne and public-health-related bacterial pathogens (Klebsiella (K.) variicola, Listeria spp., Mycobacterium (M.) tuberculosis, Staphylococcus (Sta.) aureus, and Streptococcus (Str.) pyogenes) perfect genomes were obtained. Enhanced basecalling models have generally improved assembly accuracy, however, for certain species such as Br. abortus, older models have produced higher accuracy. While long-read polishing mainly improves assembly quality with only one round needed, our results indicate that this process may also degrade assembly quality. Overall, 81% of the observed errors in ONT assemblies were located within coding sequences (CDS). Furthermore, we found that methylation caused 6.5% of the errors, and the bacterial methylation-aware medaka polishing model reduced the number of errors linked to methylation. Core-genome Multilocus Sequence Typing (cgMLST) analysis revealed allele differences in Ba. anthracis, Br. abortus, and Francisella (F.) tularensis for some assemblers, although with fewer than five allele differences. In the case of Br. melitensis, some assemblies included five allele differences, whereas for Br. suis the correct cgMLST alleles were observed.
Conclusions: Assembling nanopore data from pathogenic bacteria vary in quality across different species and methods. However, errors persist in the final assemblies, including within cgMLST loci, influencing the reliability of outbreak predictions. Nevertheless, specific combinations of existing tools can generate perfect genome assemblies from bacterial ONT sequencing data for outbreak analysis without short-read polishing.
Lataretu, Marie; Krautwurst, Sebastian; Huska, Matthew R; Marquet, Mike; Viehweger, Adrian; Braun, Sascha D; Brandt, Christian; Hölzer, Martin
Targeted decontamination of sequencing data with CLEAN Journal Article
In: NAR Genomics and Bioinformatics, vol. 7, 2025.
Abstract | Links | BibTeX | Tags: assembly, metagenomics, nanopore, RNA / transcriptomics, software
@article{nokey_81,
title = {Targeted decontamination of sequencing data with CLEAN},
author = {Marie Lataretu and Sebastian Krautwurst and Matthew R Huska and Mike Marquet and Adrian Viehweger and Sascha D Braun and Christian Brandt and Martin Hölzer},
doi = {10.1093/nargab/lqaf105},
year = {2025},
date = {2025-07-04},
urldate = {2025-07-04},
journal = {NAR Genomics and Bioinformatics},
volume = {7},
abstract = {Many biological and medical questions are answered based on the analysis of sequence data. However, we can find contamination, artificial spike-ins, and overrepresented rRNA (ribosomal RNA) sequences in various read collections and assemblies. In particular, spike-ins used as controls, as those known from Illumina or Nanopore data, are often not considered as contaminants and also not appropriately removed during analyses. Additionally, removing human host DNA may be necessary for data protection and ethical considerations to ensure that individuals cannot be identified. We developed CLEAN, a pipeline to remove unwanted sequences from both long- and short-read sequencing techniques. While focusing on Illumina and Nanopore data with their technology-specific control sequences, the pipeline can also be used for host decontamination of metagenomic reads and assemblies, or the removal of rRNA from RNA-Seq data. The results are the purified sequences and sequences identified as contaminated with statistics summarized in a report. The output can be used directly in subsequent analyses, resulting in faster computations and improved results. Although decontamination seems mundane, many contaminants are routinely overlooked, cleaned by steps that are not fully reproducible or difficult to trace. CLEAN facilitates reproducible, platform-independent data analysis in genomics and transcriptomics and is freely available at https://github.com/rki-mf1/clean under a BSD3 license.},
keywords = {assembly, metagenomics, nanopore, RNA / transcriptomics, software},
pubstate = {published},
tppubtype = {article}
}
2023
Triebel, Sandra; Sachse, Konrad; Weber, Michael; Heller, Martin; Diezel, Celia; Hölzer, Martin; Schnee, Christiane; Marz, Manja
De novo genome assembly resolving repetitive structures enables genomic analysis of 35 European Mycoplasmopsis bovis strains Journal Article
In: BMC Genomics, vol. 24, iss. 1, no. 548, 2023, ISBN: 1471-2164.
Abstract | Links | BibTeX | Tags: assembly, bacteria, DNA / genomics, nanopore, phylogenetics
@article{nokey_44,
title = {\textit{De novo} genome assembly resolving repetitive structures enables genomic analysis of 35 European \textit{Mycoplasmopsis bovis} strains},
author = {Sandra Triebel and Konrad Sachse and Michael Weber and Martin Heller and Celia Diezel and Martin Hölzer and Christiane Schnee and Manja Marz },
doi = {10.1186/s12864-023-09618-5},
isbn = {1471-2164},
year = {2023},
date = {2023-09-16},
urldate = {2023-09-16},
journal = {BMC Genomics},
volume = {24},
number = {548},
issue = {1},
abstract = {Mycoplasmopsis (M.) bovis, the agent of mastitis, pneumonia, and arthritis in cattle, harbors a small genome of approximately 1 Mbp. Combining data from Illumina and Nanopore technologies, we sequenced and assembled the genomes of 35 European strains and isolate DL422_88 from Cuba. While the high proportion of repetitive structures in M. bovis genomes represent a particular challenge, implementation of our own pipeline Mycovista (available on GitHub www.github.com/sandraTriebel/mycovista ) in a hybrid approach enabled contiguous assembly of the genomes and, consequently, improved annotation rates considerably. To put our European strain panel in a global context, we analyzed the new genome sequences together with 175 genome assemblies from public databases. Construction of a phylogenetic tree based on core genes of these 219 strains revealed a clustering pattern according to geographical origin, with European isolates positioned on clades 4 and 5. Genomic data allowing assignment of strains to tissue specificity or certain disease manifestations could not be identified. Seven strains isolated from cattle with systemic circular condition (SCC), still a largely unknown manifestation of M. bovis disease, were located on both clades 4 and 5. Pairwise association analysis revealed 108 genomic elements associated with a particular clade of the phylogenetic tree. Further analyzing these hits, 25 genes are functionally annotated and could be linked to a M. bovis protein, e.g. various proteases and nucleases, as well as ten variable surface lipoproteins (Vsps) and other surface proteins. These clade-specific genes could serve as useful markers in epidemiological and clinical surveys.},
keywords = {assembly, bacteria, DNA / genomics, nanopore, phylogenetics},
pubstate = {published},
tppubtype = {article}
}
Sachse, Konrad; Hölzer, Martin; Vorimore, Fabien; Barf, Lisa-Marie; Sachse, Carsten; Laroucau, Karine; Marz, Manja; Lamkiewicz, Kevin
Genomic analysis of 61 Chlamydia psittaci strains reveals extensive divergence associated with host preference Journal Article
In: BMC Genomics, vol. 24, iss. 1, pp. 288, 2023, ISBN: 1471-2164.
Abstract | Links | BibTeX | Tags: alignment, assembly, bacteria, DNA / genomics, phylogenetics
@article{nokey_35,
title = {Genomic analysis of 61 \textit{Chlamydia psittaci} strains reveals extensive divergence associated with host preference},
author = {Konrad Sachse and Martin Hölzer and Fabien Vorimore and Lisa-Marie Barf and Carsten Sachse and Karine Laroucau and Manja Marz and Kevin Lamkiewicz },
doi = {10.1186/s12864-023-09370-w},
isbn = {1471-2164},
year = {2023},
date = {2023-05-29},
urldate = {2023-05-29},
journal = {BMC Genomics},
volume = {24},
issue = {1},
pages = {288},
abstract = {Background
Chlamydia (C.) psittaci, the causative agent of avian chlamydiosis and human psittacosis, is a genetically heterogeneous species. Its broad host range includes parrots and many other birds, but occasionally also humans (via zoonotic transmission), ruminants, horses, swine and rodents. To assess whether there are genetic markers associated with host tropism we comparatively analyzed whole-genome sequences of 61 C. psittaci strains, 47 of which carrying a 7.6-kbp plasmid.
Results
Following clean-up, reassembly and polishing of poorly assembled genomes from public databases, phylogenetic analyses using C. psittaci whole-genome sequence alignment revealed four major clades within this species. Clade 1 represents the most recent lineage comprising 40/61 strains and contains 9/10 of the psittacine strains, including type strain 6BC, and 10/13 of human isolates. Strains from different non-psittacine hosts clustered in Clades 2– 4. We found that clade membership correlates with typing schemes based on SNP types, ompA genotypes, multilocus sequence types as well as plasticity zone (PZ) structure and host preference. Genome analysis also revealed that i) sequence variation in the major outer membrane porin MOMP can result in 3D structural changes of immunogenic domains, ii) past host change of Clade 3 and 4 strains could be associated with loss of MAC/perforin in the PZ, rather than the large cytotoxin, iii) the distinct phylogeny of atypical strains (Clades 3 and 4) is also reflected in their repertoire of inclusion proteins (Inc family) and polymorphic membrane proteins (Pmps).
Conclusions
Our study identified a number of genomic features that can be correlated with the phylogeny and host preference of C. psittaci strains. Our data show that intra-species genomic divergence is associated with past host change and includes deletions in the plasticity zone, structural variations in immunogenic domains and distinct repertoires of virulence factors.},
keywords = {alignment, assembly, bacteria, DNA / genomics, phylogenetics},
pubstate = {published},
tppubtype = {article}
}
Chlamydia (C.) psittaci, the causative agent of avian chlamydiosis and human psittacosis, is a genetically heterogeneous species. Its broad host range includes parrots and many other birds, but occasionally also humans (via zoonotic transmission), ruminants, horses, swine and rodents. To assess whether there are genetic markers associated with host tropism we comparatively analyzed whole-genome sequences of 61 C. psittaci strains, 47 of which carrying a 7.6-kbp plasmid.
Results
Following clean-up, reassembly and polishing of poorly assembled genomes from public databases, phylogenetic analyses using C. psittaci whole-genome sequence alignment revealed four major clades within this species. Clade 1 represents the most recent lineage comprising 40/61 strains and contains 9/10 of the psittacine strains, including type strain 6BC, and 10/13 of human isolates. Strains from different non-psittacine hosts clustered in Clades 2– 4. We found that clade membership correlates with typing schemes based on SNP types, ompA genotypes, multilocus sequence types as well as plasticity zone (PZ) structure and host preference. Genome analysis also revealed that i) sequence variation in the major outer membrane porin MOMP can result in 3D structural changes of immunogenic domains, ii) past host change of Clade 3 and 4 strains could be associated with loss of MAC/perforin in the PZ, rather than the large cytotoxin, iii) the distinct phylogeny of atypical strains (Clades 3 and 4) is also reflected in their repertoire of inclusion proteins (Inc family) and polymorphic membrane proteins (Pmps).
Conclusions
Our study identified a number of genomic features that can be correlated with the phylogeny and host preference of C. psittaci strains. Our data show that intra-species genomic divergence is associated with past host change and includes deletions in the plasticity zone, structural variations in immunogenic domains and distinct repertoires of virulence factors.
Erkes, Annett; Grove, René P; Žarković, Milena; Krautwurst, Sebastian; Koebnik, Ralf; Morgan, Richard D; Wilson, Geoffrey G; Hölzer, Martin; Marz, Manja; Boch, Jens; Grau, Jan
Assembling highly repetitive Xanthomonas TALomes using Oxford Nanopore sequencing Journal Article
In: BMC Genomics, vol. 24, iss. 1, pp. 151, 2023.
Abstract | Links | BibTeX | Tags: assembly, DNA / genomics, nanopore
@article{nokey,
title = {Assembling highly repetitive Xanthomonas TALomes using Oxford Nanopore sequencing},
author = {Annett Erkes and René P Grove and Milena Žarković and Sebastian Krautwurst and Ralf Koebnik and Richard D Morgan and Geoffrey G Wilson and Martin Hölzer and Manja Marz and Jens Boch and Jan Grau
},
doi = {10.1186/s12864-023-09228-1},
year = {2023},
date = {2023-03-27},
journal = {BMC Genomics},
volume = {24},
issue = {1},
pages = {151},
abstract = {Background: Most plant-pathogenic Xanthomonas bacteria harbor transcription activator-like effector (TALE) genes, which function as transcriptional activators of host plant genes and support infection. The entire repertoire of up to 29 TALE genes of a Xanthomonas strain is also referred to as TALome. The DNA-binding domain of TALEs is comprised of highly conserved repeats and TALE genes often occur in gene clusters, which precludes the assembly of TALE-carrying Xanthomonas genomes based on standard sequencing approaches.
Results: Here, we report the successful assembly of the 5 Mbp genomes of five Xanthomonas strains from Oxford Nanopore Technologies (ONT) sequencing data. For one of these strains, Xanthomonas oryzae pv. oryzae (Xoo) PXO35, we illustrate why Illumina short reads and longer PacBio reads are insufficient to fully resolve the genome. While ONT reads are perfectly suited to yield highly contiguous genomes, they suffer from a specific error profile within homopolymers. To still yield complete and correct TALomes from ONT assemblies, we present a computational correction pipeline specifically tailored to TALE genes, which yields at least comparable accuracy as Illumina-based polishing. We further systematically assess the ONT-based pipeline for its multiplexing capacity and find that, combined with computational correction, the complete TALome of Xoo PXO35 could have been reconstructed from less than 20,000 ONT reads.
Conclusions: Our results indicate that multiplexed ONT sequencing combined with a computational correction of TALE genes constitutes a highly capable tool for characterizing the TALomes of huge collections of Xanthomonas strains in the future.},
keywords = {assembly, DNA / genomics, nanopore},
pubstate = {published},
tppubtype = {article}
}
Results: Here, we report the successful assembly of the 5 Mbp genomes of five Xanthomonas strains from Oxford Nanopore Technologies (ONT) sequencing data. For one of these strains, Xanthomonas oryzae pv. oryzae (Xoo) PXO35, we illustrate why Illumina short reads and longer PacBio reads are insufficient to fully resolve the genome. While ONT reads are perfectly suited to yield highly contiguous genomes, they suffer from a specific error profile within homopolymers. To still yield complete and correct TALomes from ONT assemblies, we present a computational correction pipeline specifically tailored to TALE genes, which yields at least comparable accuracy as Illumina-based polishing. We further systematically assess the ONT-based pipeline for its multiplexing capacity and find that, combined with computational correction, the complete TALome of Xoo PXO35 could have been reconstructed from less than 20,000 ONT reads.
Conclusions: Our results indicate that multiplexed ONT sequencing combined with a computational correction of TALE genes constitutes a highly capable tool for characterizing the TALomes of huge collections of Xanthomonas strains in the future.
2021
Martín-Hernández, Giselle C; Müller, Bettina; Chmielarz, Mikołaj; Brandt, Christian; Hölzer, Martin; Viehweger, Adrian; Passoth, Volkmar
Chromosome-level genome assembly and transcriptome-based annotation of the oleaginous yeast Rhodotorula toruloides CBS 14 Journal Article
In: Genomics, vol. 113, no. 6, pp. 4022-4027, 2021.
Abstract | Links | BibTeX | Tags: annotation, assembly, DNA / genomics, fungi, nanopore
@article{Martín-Hernández2021,
title = {Chromosome-level genome assembly and transcriptome-based annotation of the oleaginous yeast Rhodotorula toruloides CBS 14},
author = {Giselle C Martín-Hernández and Bettina Müller and Mikołaj Chmielarz and Christian Brandt and Martin Hölzer and Adrian Viehweger and Volkmar Passoth
},
doi = {10.1016/j.ygeno.2021.10.006},
year = {2021},
date = {2021-10-11},
urldate = {2021-10-11},
journal = {Genomics},
volume = {113},
number = {6},
pages = {4022-4027},
abstract = {Rhodotorula toruloides is an oleaginous yeast with high biotechnological potential. In order to understand the molecular physiology of lipid synthesis in R. toruloides and to advance metabolic engineering, a high-resolution genome is required. We constructed a genome draft of R. toruloides CBS 14, using a hybrid assembly approach, consisting of short and long reads generated by Illumina and Nanopore sequencing, respectively. The genome draft consists of 23 contigs and 3 scaffolds, with a N50 length of 1,529,952 bp, thus largely representing chromosomal organization. The total size of the genome is 20,534,857 bp and the overall GC content is 61.83%. Transcriptomic data from different growth conditions was used to aid species-specific gene annotation. We annotated 9464 genes and identified 11,691 transcripts. Furthermore, we demonstrated the presence of a potential plasmid, an extrachromosomal circular structure of about 11 kb with a copy number about three times as high as the other chromosomes.},
keywords = {annotation, assembly, DNA / genomics, fungi, nanopore},
pubstate = {published},
tppubtype = {article}
}
Damme, Renaud Van; Hölzer, Martin; Viehweger, Adrian; Müller, Bettina; Bongcam-Rudloff, Erik; Brandt, Christian
Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN) Journal Article
In: PLOS Comput Biol, vol. 17, no. 2, pp. e1008716, 2021.
Abstract | Links | BibTeX | Tags: annotation, assembly, classification, DNA / genomics, metagenomics, RNA / transcriptomics, software
@article{VanDamme:21,
title = {Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN)},
author = {Renaud Van Damme and Martin Hölzer and Adrian Viehweger and Bettina Müller and Erik Bongcam-Rudloff and Christian Brandt},
editor = {Mihaela Pertea},
url = {https://github.com/RVanDamme/MUFFIN},
doi = {10.1371/journal.pcbi.1008716},
year = {2021},
date = {2021-02-09},
urldate = {2021-02-09},
journal = {PLOS Comput Biol},
volume = {17},
number = {2},
pages = {e1008716},
publisher = {Public Library of Science (PLoS)},
abstract = {Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used for quantification and annotation by providing RNA-Seq data (optionally). We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and, if provided de novo metatranscript annotations across the metagenomic sample and for each bin. MUFFIN is available on github under GNUv3 licence: https://github.com/RVanDamme/MUFFIN.},
keywords = {annotation, assembly, classification, DNA / genomics, metagenomics, RNA / transcriptomics, software},
pubstate = {published},
tppubtype = {article}
}
Krautwurst, Sebastian; Dijkman, Ronald; Thiel, Volker; Krumbholz, Andi; Marz, Manja
Direct RNA Sequencing for Complete Viral Genomes Book Section
In: Frishman, Dmitrij; Marz, Manja (Ed.): Virus Bioinformatics, CRC Press, 2021.
Abstract | Links | BibTeX | Tags: assembly, DNA / genomics, nanopore, nucleic acid modifications, RNA / transcriptomics, viruses
@incollection{Krautwurst:21,
title = {Direct RNA Sequencing for Complete Viral Genomes},
author = {Sebastian Krautwurst and Ronald Dijkman and Volker Thiel and Andi Krumbholz and Manja Marz},
editor = {Dmitrij Frishman and Manja Marz},
url = {https://www.taylorfrancis.com/chapters/edit/10.1201/9781003097679-3/direct-rna-sequencing-complete-viral-genomes-sebastian-krautwurst-ronald-dijkman-volker-thiel-andi-krumbholz-manja-marz},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {Virus Bioinformatics},
publisher = {CRC Press},
abstract = {Determination of nucleotide sequences present in biological samples (termed “sequencing”) has become a key method in almost all fields of bioscience, including virology. Since the advent of high-throughput sequencing (“second-generation sequencing”), it is possible to sequence millions of DNA fragments (“reads”) in parallel at very high accuracy, enabling the inference of single nucleotide polymorphisms (SNPs) between virus strains.
In this chapter, we provide details on how the long-read sequencing technologies (“third-generation sequencing”) which were developed in recent years have expanded the toolkit for researchers beyond the possibilities of short-read sequencing, with a focus on virus sequencing. With increased read lengths, it is possible to sequence full viral transcripts and genomes in single contiguous reads, enabling detailed studies of transcript isoforms, haplotypes, and viral quasispecies. In comparison, long-read technologies have generally higher raw read error rates, but an accurate assembly of transcripts and genomes is facilitated or made unnecessary due to the long contiguous sequences. One of the technologies, namely nanopore sequencing, also uniquely allows for direct RNA sequencing without the need for the creation or amplification of complementary DNA. This enables accurate capture of RNA content in a sample “as is,” e.g., in cells infected by RNA viruses. The protocol also leaves RNA modifications intact, which can be inferred during sequencing. Nanopore sequencing can be implemented at low costs and with constant genome coverage using cDNA amplicon sequencing methods, e.g., for highly parallel screening during virus outbreaks.},
keywords = {assembly, DNA / genomics, nanopore, nucleic acid modifications, RNA / transcriptomics, viruses},
pubstate = {published},
tppubtype = {incollection}
}
In this chapter, we provide details on how the long-read sequencing technologies (“third-generation sequencing”) which were developed in recent years have expanded the toolkit for researchers beyond the possibilities of short-read sequencing, with a focus on virus sequencing. With increased read lengths, it is possible to sequence full viral transcripts and genomes in single contiguous reads, enabling detailed studies of transcript isoforms, haplotypes, and viral quasispecies. In comparison, long-read technologies have generally higher raw read error rates, but an accurate assembly of transcripts and genomes is facilitated or made unnecessary due to the long contiguous sequences. One of the technologies, namely nanopore sequencing, also uniquely allows for direct RNA sequencing without the need for the creation or amplification of complementary DNA. This enables accurate capture of RNA content in a sample “as is,” e.g., in cells infected by RNA viruses. The protocol also leaves RNA modifications intact, which can be inferred during sequencing. Nanopore sequencing can be implemented at low costs and with constant genome coverage using cDNA amplicon sequencing methods, e.g., for highly parallel screening during virus outbreaks.
2020
Hölzer, Martin
A decade of de novo transcriptome assembly: Are we there yet? Journal Article
In: Mol Ecol Resour, vol. 21, no. 1, pp. 11-13, 2020.
Abstract | Links | BibTeX | Tags: assembly, review, RNA / transcriptomics
@article{Hoelzer:20,
title = {A decade of de novo transcriptome assembly: Are we there yet?},
author = {Martin Hölzer},
doi = {10.1111/1755-0998.13268},
year = {2020},
date = {2020-10-08},
urldate = {2020-01-01},
journal = {Mol Ecol Resour},
volume = {21},
number = {1},
pages = {11-13},
publisher = {Wiley},
abstract = {A decade ago, de novo transcriptome assembly evolved as a versatile and powerful approach to make evolutionary assumptions, analyse gene expression, and annotate novel transcripts, in particular, for non-model organisms lacking an appropriate reference genome. Various tools have been developed to generate a transcriptome assembly, and even more computational methods depend on the results of these tools for further downstream analyses. In this issue of Molecular Ecology Resources, Freedman et al. (Mol Ecol Resourc 2020) present a comprehensive analysis of errors in de novo transcriptome assemblies across public data sets and different assembly methods. They focus on two implicit assumptions that are often violated: First, the assembly presents an unbiased view of the transcriptome. Second, the expression estimates derived from the assembly are reasonable, albeit noisy, approximations of the relative frequency of expressed transcripts. They show that appropriate filtering can reduce this bias but can also lead to the loss of a reasonable number of highly expressed transcripts. Thus, to partly alleviate the noise in expression estimates, they propose a new normalization method called length-rescaled CPM. Remarkably, the authors found considerable distortions at the nucleotide level, which leads to an underestimation of diversity in transcriptome assemblies. The study by Freedman et al. (Mol Ecol Resourc 2020) clearly shows that we have not yet reached “high-quality” in the field of transcriptome assembly. Above all, it helps researchers be aware of these problems and filter and interpret their transcriptome assembly data appropriately and with caution.},
keywords = {assembly, review, RNA / transcriptomics},
pubstate = {published},
tppubtype = {article}
}
Overholt, Will A.; Hölzer, Martin; Geesink, Patricia; Diezel, Celia; Marz, Manja; Küsel, Kirsten
Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome-assembled genomes from a complex aquifer system Journal Article
In: Environ Microbiol, vol. 22, no. 9, pp. 4000-4013, 2020.
Abstract | Links | BibTeX | Tags: assembly, DNA / genomics, groundwater, metagenomics, nanopore, viruses
@article{Overholt:20,
title = {Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome-assembled genomes from a complex aquifer system},
author = {Will A. Overholt and Martin Hölzer and Patricia Geesink and Celia Diezel and Manja Marz and Kirsten Küsel},
doi = {10.1111/1462-2920.15186},
year = {2020},
date = {2020-08-05},
urldate = {2020-08-05},
journal = {Environ Microbiol},
volume = {22},
number = {9},
pages = {4000-4013},
publisher = {Wiley},
abstract = {Assembling microbial and viral genomes from metagenomes is a powerful and appealing method to understand structure–function relationships in complex environments. To compare the recovery of genomes from microorganisms and their viruses from groundwater, we generated shotgun metagenomes with Illumina sequencing accompanied by long reads derived from the Oxford Nanopore Technologies (ONT) sequencing platform. Assembly and metagenome-assembled genome (MAG) metrics for both microbes and viruses were determined from an Illumina-only assembly, ONT-only assembly, and a hybrid assembly approach. The hybrid approach recovered 2× more mid to high-quality MAGs compared to the Illumina-only approach and 4× more than the ONT-only approach. A similar number of viral genomes were reconstructed using the hybrid and ONT methods, and both recovered nearly fourfold more viral genomes than the Illumina-only approach. While yielding fewer MAGs, the ONT-only approach generated MAGs with a high probability of containing rRNA genes, 3× higher than either of the other methods. Of the shared MAGs recovered from each method, the ONT-only approach generated the longest and least fragmented MAGs, while the hybrid approach yielded the most complete. This work provides quantitative data to inform a cost–benefit analysis of the decision to supplement shotgun metagenomic projects with long reads towards the goal of recovering genomes from environmentally abundant groups.},
keywords = {assembly, DNA / genomics, groundwater, metagenomics, nanopore, viruses},
pubstate = {published},
tppubtype = {article}
}
2019
Mostajo, Nelly F.; Lataretu, Marie; Krautwurst, Sebastian; Mock, Florian; Desirò, Daniel; Lamkiewicz, Kevin; Collatz, Maximilian; Schoen, Andreas; Weber, Friedemann; Marz, Manja; Hölzer, Martin
A comprehensive annotation and differential expression analysis of short and long non-coding RNAs in 16 bat genomes Journal Article
In: NAR Genomics Bioinf, vol. 2, no. 1, pp. lqz006, 2019.
Abstract | Links | BibTeX | Tags: annotation, assembly, differential expression analysis, evolution, ncRNAs, RNA / transcriptomics, virus host interaction, viruses
@article{Mostajo:20,
title = {A comprehensive annotation and differential expression analysis of short and long non-coding RNAs in 16 bat genomes},
author = {Nelly F. Mostajo and Marie Lataretu and Sebastian Krautwurst and Florian Mock and Daniel Desirò and Kevin Lamkiewicz and Maximilian Collatz and Andreas Schoen and Friedemann Weber and Manja Marz and Martin Hölzer},
url = {https://www.rna.uni-jena.de/supplements/bats/index.html},
doi = {10.1093/nargab/lqz006},
year = {2019},
date = {2019-09-30},
urldate = {2019-09-30},
journal = {NAR Genomics Bioinf},
volume = {2},
number = {1},
pages = {lqz006},
abstract = {Although bats are increasingly becoming the focus of scientific studies due to their unique properties, these exceptional animals are still among the least studied mammals. Assembly quality and completeness of bat genomes vary a lot and especially non-coding RNA (ncRNA) annotations are incomplete or simply missing. Accordingly, standard bioinformatics pipelines for gene expression analysis often ignore ncRNAs such as microRNAs or long antisense RNAs. The main cause of this problem is the use of incomplete genome annotations. We present a complete screening for ncRNAs within 16 bat genomes. NcRNAs affect a remarkable variety of vital biological functions, including gene expression regulation, RNA processing, RNA interference and, as recently described, regulatory processes in viral infections. Within all investigated bat assemblies, we annotated 667 ncRNA families including 162 snoRNAs and 193 miRNAs as well as rRNAs, tRNAs, several snRNAs and lncRNAs, and other structural ncRNA elements. We validated our ncRNA candidates by six RNA-Seq data sets and show significant expression patterns that have never been described before in a bat species on such a large scale. Our annotations will be usable as a resource (rna.uni-jena.de/supplements/bats) for deeper studying of bat evolution, ncRNAs repertoire, gene expression and regulation, ecology and important host–virus interactions.},
keywords = {annotation, assembly, differential expression analysis, evolution, ncRNAs, RNA / transcriptomics, virus host interaction, viruses},
pubstate = {published},
tppubtype = {article}
}
Viehweger, Adrian; Krautwurst, Sebastian; Lamkiewicz, Kevin; Madhugiri, Ramakanth; Ziebuhr, John; Hölzer, Martin; Marz, Manja
In: Genome Res, vol. 29, pp. 1545-1554, 2019.
Abstract | Links | BibTeX | Tags: assembly, coronavirus, nanopore, nucleic acid modifications, RNA / transcriptomics, viruses
@article{Viehweger:19a,
title = {Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis.},
author = {Adrian Viehweger and Sebastian Krautwurst and Kevin Lamkiewicz and Ramakanth Madhugiri and John Ziebuhr and Martin Hölzer and Manja Marz},
doi = {10.1101/gr.247064.118},
year = {2019},
date = {2019-08-22},
urldate = {2019-08-22},
journal = {Genome Res},
volume = {29},
pages = {1545-1554},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Sequence analyses of RNA virus genomes remain challenging owing to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses, so-called “quasispecies.” Standard (short-read) sequencing technologies are ill-suited to reconstruct large numbers of full-length haplotypes of (1) RNA virus genomes and (2) subgenome-length (sg) RNAs composed of noncontiguous genome regions. Here, we used a full-length, direct RNA sequencing (DRS) approach based on nanopores to characterize viral RNAs produced in cells infected with a human coronavirus. By using DRS, we were able to map the longest (∼26-kb) contiguous read to the viral reference genome. By combining Illumina and Oxford Nanopore sequencing, we reconstructed a highly accurate consensus sequence of the human coronavirus (HCoV)-229E genome (27.3 kb). Furthermore, by using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which circumvents reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs. Our work paves the way for haplotype-based analyses of viral quasispecies by showing the feasibility of intra-sample haplotype separation. Even though several technical challenges remain to be addressed to exploit the potential of the nanopore technology fully, our work illustrates that DRS may significantly advance genomic studies of complex virus populations, including predictions on long-range interactions in individual full-length viral RNA haplotypes.},
keywords = {assembly, coronavirus, nanopore, nucleic acid modifications, RNA / transcriptomics, viruses},
pubstate = {published},
tppubtype = {article}
}
Hölzer, Martin; Marz, Manja
De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers Journal Article
In: GigaScience, vol. 8, no. 5, pp. giz039, 2019.
Abstract | Links | BibTeX | Tags: assembly, RNA / transcriptomics
@article{Hoelzer:19,
title = {\textit{De novo} transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers},
author = {Martin Hölzer and Manja Marz},
doi = {10.1093/gigascience/giz039},
year = {2019},
date = {2019-05-11},
urldate = {2019-01-01},
journal = {GigaScience},
volume = {8},
number = {5},
pages = {giz039},
publisher = {Oxford University Press (OUP)},
abstract = {Background: In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly.
Results: Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets.
Conclusions: We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.},
keywords = {assembly, RNA / transcriptomics},
pubstate = {published},
tppubtype = {article}
}
Results: Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets.
Conclusions: We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.
Viehweger, Adrian; Krautwurst, Sebastian; Koenig, Brigitte; Marz, Manja
An encoding of genome content for machine learning Journal Article
In: bioRxiv, pp. 524280, 2019.
Abstract | Links | BibTeX | Tags: assembly, machine learning, metagenomics
@article{Viehweger:19,
title = {An encoding of genome content for machine learning},
author = {Adrian Viehweger and Sebastian Krautwurst and Brigitte Koenig and Manja Marz},
url = {https://github.com/phiweger/nanotext},
doi = {10.1101/524280},
year = {2019},
date = {2019-01-18},
urldate = {2019-01-18},
journal = {bioRxiv},
pages = {524280},
publisher = {Cold Spring Harbor Laboratory},
abstract = {An ever-growing number of metagenomes can be used for biomining and the study of microbial functions. The use of learning algorithms in this context has been hindered, because they often need input in the form of low-dimensional, dense vectors of numbers. We propose such a representation for genomes called nanotext that scales to very large data sets.
The underlying model is learned from a corpus of nearly 150 thousand genomes spanning 750 million protein domains. We treat the protein domains in a genome like words in a document, assuming that protein domains in a similar context have similar “meaning”. This meaning can be distributed by a neural net over a vector of numbers.
The resulting vectors efficiently encode function, preserve known phylogeny, capture subtle functional relationships and are robust against genome incompleteness. The “functional” distance between two vectors complements nucleotide-based distance, so that genomes can be identified as similar even though their nucleotide identity is low. nanotext can thus encode (meta)genomes for direct use in downstream machine learning tasks. We show this by predicting plausible culture media for metagenome assembled genomes (MAGs) from the Tara Oceans Expedition using their genome content only. nanotext is freely released under a BSD licence (https://github.com/phiweger/nanotext).},
keywords = {assembly, machine learning, metagenomics},
pubstate = {published},
tppubtype = {article}
}
The underlying model is learned from a corpus of nearly 150 thousand genomes spanning 750 million protein domains. We treat the protein domains in a genome like words in a document, assuming that protein domains in a similar context have similar “meaning”. This meaning can be distributed by a neural net over a vector of numbers.
The resulting vectors efficiently encode function, preserve known phylogeny, capture subtle functional relationships and are robust against genome incompleteness. The “functional” distance between two vectors complements nucleotide-based distance, so that genomes can be identified as similar even though their nucleotide identity is low. nanotext can thus encode (meta)genomes for direct use in downstream machine learning tasks. We show this by predicting plausible culture media for metagenome assembled genomes (MAGs) from the Tara Oceans Expedition using their genome content only. nanotext is freely released under a BSD licence (https://github.com/phiweger/nanotext).
2017
Möbius, Petra; Nordsiek, Gabriele; Hölzer, Martin; Jarek, Michael; Marz, Manja; Köhler, Heike
Complete Genome Sequence of JII-1961, a Bovine Mycobacterium avium subsp. paratuberculosis Field Isolate from Germany Journal Article
In: Genome Announc, vol. 5, no. 34, 2017.
Abstract | Links | BibTeX | Tags: assembly, bacteria, DNA / genomics
@article{Moebius:17,
title = {Complete Genome Sequence of JII-1961, a Bovine \textit{Mycobacterium avium} subsp. \textit{paratuberculosis} Field Isolate from Germany},
author = {Petra Möbius and Gabriele Nordsiek and Martin Hölzer and Michael Jarek and Manja Marz and Heike Köhler},
doi = {10.1128/genomeA.00870-17},
year = {2017},
date = {2017-08-24},
urldate = {2017-01-01},
journal = {Genome Announc},
volume = {5},
number = {34},
abstract = {Mycobacterium avium subsp. paratuberculosis causes Johne’s disease in ruminants and was also detected in nonruminant species, including human beings, and in milk products. We announce here the 4.829-Mb complete genome sequence of the cattle-type strain JII-1961 from Germany, which is very similar to cattle-type strains recovered from different continents.
},
keywords = {assembly, bacteria, DNA / genomics},
pubstate = {published},
tppubtype = {article}
}
2016
Hölzer, Martin; Laroucau, Karine; Creasy, Heather Huot; Ott, Sandra; Vorimore, Fabien; Bavoil, Patrik M.; Marz, Manja; Sachse, Konrad
Whole-Genome Sequence of Chlamydia gallinacea Type Strain 08-1274/3 Journal Article
In: Genome Announc, vol. 4, no. 4, 2016.
Abstract | Links | BibTeX | Tags: assembly, bacteria, DNA / genomics
@article{Hoelzer:16a,
title = {Whole-Genome Sequence of \textit{Chlamydia gallinacea} Type Strain 08-1274/3},
author = {Martin Hölzer and Karine Laroucau and Heather Huot Creasy and Sandra Ott and Fabien Vorimore and Patrik M. Bavoil and Manja Marz and Konrad Sachse},
doi = {10.1128/genomeA.00708-16},
year = {2016},
date = {2016-07-21},
urldate = {2016-07-21},
journal = {Genome Announc},
volume = {4},
number = {4},
abstract = {The recently introduced bacterial species Chlamydia gallinacea is known to occur in domestic poultry and other birds. Its potential as an avian pathogen and zoonotic agent is under investigation. The whole-genome sequence of its type strain, 08-1274/3, consists of a 1,059,583-bp chromosome with 914 protein-coding sequences (CDSs) and a plasmid (p1274) comprising 7,619 bp with 9 CDSs.},
keywords = {assembly, bacteria, DNA / genomics},
pubstate = {published},
tppubtype = {article}
}
2015
Möbius, Petra; Hölzer, Martin; Felder, Marius; Nordsiek, Gabriele; Groth, Marco; Köhler, Heike; Reichwald, Kathrin; Platzer, Matthias; Marz, Manja
Comprehensive insights in the Mycobacterium avium subsp. paratuberculosis genome using new WGS data of sheep strain JIII-386 from Germany Journal Article
In: Genome Biol Evol, vol. 7, no. 9, pp. 2585–2601, 2015.
Abstract | Links | BibTeX | Tags: annotation, assembly, bacteria, DNA / genomics
@article{Moebius:15,
title = {Comprehensive insights in the \textit{Mycobacterium avium} subsp. \textit{paratuberculosis} genome using new WGS data of sheep strain JIII-386 from Germany},
author = {Petra Möbius and Martin Hölzer and Marius Felder and Gabriele Nordsiek and Marco Groth and Heike Köhler and Kathrin Reichwald and Matthias Platzer and Manja Marz},
doi = {10.1093/gbe/evv154},
year = {2015},
date = {2015-09-17},
urldate = {2015-09-17},
journal = {Genome Biol Evol},
volume = {7},
number = {9},
pages = {2585--2601},
abstract = {Mycobacterium avium (M. a.) subsp. paratuberculosis (MAP) - the etiologic agent of Johne's disease - affects cattle, sheep and other ruminants worldwide. To decipher phenotypic differences among sheep and cattle strains (belonging to MAP-S [Type-I/III] respectively MAP-C [Type-II]) comparative genome analysis needs data from diverse isolates originating from different geographic regions of the world. The current study presents the so far best assembled genome of a MAP-S-strain: sheep isolate JIII-386 from Germany. One newly sequenced cattle isolate (JII-1961, Germany), four published MAP strains of MAP-C and MAP-S from U.S. and Australia and M. a. subsp. hominissuis (MAH) strain 104 were used for assembly improvement and comparisons. All genomes were annotated by BacProt and results compared with NCBI annotation. Corresponding protein-coding sequences (CDSs) were detected, but also CDSs that were exclusively determined either by NCBI or BacProt. A new Shine-Dalgarno sequence motif (5'AGCTGG3') was extracted. Novel CDSs including PE-PGRS family protein genes and about 80 non-coding RNAs exhibiting high sequence conservation are presented. Previously found genetic differences between MAP-types are partially revised. Four out of ten assumed MAP-S-specific large sequence polymorphism regions (LSP s) are still present in MAP-C strains; new LSP s were identified. Independently of the regional origin of the strains, the number of individual CDSs and single nucleotide variants confirm the strong similarity of MAP-C strains and show higher diversity among MAP-S strains. This study gives ambiguous results regarding the hypothesis that MAP-S is the evolutionary intermediate between MAH and MAP-C, but it clearly shows a higher similarity of MAP to MAH than to M. intracellulare.},
keywords = {annotation, assembly, bacteria, DNA / genomics},
pubstate = {published},
tppubtype = {article}
}
2014
Bauer, Eugen; Salem, Hassan; Marz, Manja; Vogel, Heiko; Kaltenpoth, Martin
In: PLoS One, vol. 9, pp. e114865, 2014.
Abstract | Links | BibTeX | Tags: annotation, assembly, bacteria, differential expression analysis, insects, RNA / transcriptomics
@article{Bauer:14,
title = {Transcriptomic immune response of the cotton stainer \textit{Dysdercus fasciatus} to experimental elimination of vitamin-supplementing intestinal symbionts},
author = {Eugen Bauer and Hassan Salem and Manja Marz and Heiko Vogel and Martin Kaltenpoth},
url = {http://www.ebi.ac.uk/ena/data/view/PRJEB6171},
doi = {10.1371/journal.pone.0114865},
year = {2014},
date = {2014-12-09},
urldate = {2014-12-09},
journal = {PLoS One},
volume = {9},
pages = {e114865},
abstract = {The acquisition and vertical transmission of bacterial symbionts plays an important role in insect evolution and ecology. However, the molecular mechanisms underlying the stable maintenance and control of mutualistic bacteria remain poorly understood. The cotton stainer Dysdercus fasciatus harbours the actinobacterial symbionts Coriobacterium glomerans and Gordonibacter sp. in its midgut. The symbionts supplement limiting B vitamins and thereby significantly contribute to the host's fitness. In this study, we experimentally disrupted the symbionts' vertical transmission route and performed comparative transcriptomic analyses of genes expressed in the gut of aposymbiotic (symbiont-free) and control individuals to study the host immune response in presence and absence of the mutualists. Annotation of assembled cDNA reads identified a considerable number of genes involved in the innate immune system, including different protein isoforms of several immune effector proteins (specifically i-type lysozyme, defensin, hemiptericin, and pyrrhocoricin), suggesting the possibility for a highly differentiated response towards the complex resident microbial community. Gene expression analyses revealed a constitutive expression of transcripts involved in signal transduction of the main insect immune pathways, but differential expression of certain antimicrobial peptide genes. Specifically, qPCRs confirmed the significant down-regulation of c-type lysozyme and up-regulation of hemiptericin in aposymbiotic individuals. The high expression of c-type lysozyme in symbiont-containing bugs may serve to lyse symbiont cells and thereby harvest B-vitamins that are necessary for subsistence on the deficient diet of Malvales seeds. Our findings suggest a sophisticated host response to perturbation of the symbiotic gut microbiota, indicating that the innate immune system not only plays an important role in combating pathogens, but also serves as a communication interface between host and symbionts.},
keywords = {annotation, assembly, bacteria, differential expression analysis, insects, RNA / transcriptomics},
pubstate = {published},
tppubtype = {article}
}
Wehner, Stefanie; Damm, Katrin; Hartmann, Roland K; Marz, Manja
Dissemination of 6S RNA among bacteria Journal Article
In: RNA Biol, vol. 11, pp. 1467–1478, 2014.
Abstract | Links | BibTeX | Tags: assembly, bacteria, ncRNAs, RNA / transcriptomics, RNA structure
@article{Wehner:14,
title = {Dissemination of 6S RNA among bacteria},
author = {Stefanie Wehner and Katrin Damm and Roland K Hartmann and Manja Marz},
doi = {10.4161/rna.29894},
year = {2014},
date = {2014-10-31},
urldate = {2014-10-31},
journal = {RNA Biol},
volume = {11},
pages = {1467--1478},
abstract = {6S RNA is a highly abundant small non-coding RNA widely spread among diverse bacterial groups. By competing with DNA promoters for binding to RNA polymerase (RNAP), the RNA regulates transcription on a global scale. RNAP produces small product RNAs derived from 6S RNA as template, which rearranges the 6S RNA structure leading to dissociation of 6S RNA:RNAP complexes. Although 6S RNA has been experimentally analysed in detail for some species, such as Escherichia coli and Bacillus subtilis, and was computationally predicted in many diverse bacteria, a complete and up-to-date overview of the distribution among all bacteria is missing. In this study we searched with new methods for 6S RNA genes in all currently available bacterial genomes. We ended up with a set of 1,750 6S RNA genes, of which 1,367 are novel and bona fide, distributed among 1,610 bacteria, and had a few tentative candidates among the remaining 510 assembled bacterial genomes accessible. We were able to confirm two tentative candidates by Northern blot analysis. We extended 6S RNA genes of the Flavobacteriia significantly in length compared to the present Rfam entry. We describe multiple homologs of 6S RNAs (including split 6S RNA genes) and performed a detailed synteny analysis.},
keywords = {assembly, bacteria, ncRNAs, RNA / transcriptomics, RNA structure},
pubstate = {published},
tppubtype = {article}
}
Schwartze, Volker U.; Winter, Sascha; Shelest, Ekaterina; Marcet-Houben, Marina; Horn, Fabian; Wehner, Stefanie; Linde, Jörg; Valiante, Vito; Sammeth, Michael; Riege, Konstantin; Nowrousian, Minou; Kaerger, Kerstin; Jacobsen, Ilse D.; Marz, Manja; Brakhage, Axel A.; Gabaldón, Toni; Böcker, Sebastian; Voigt, Kerstin
In: PLos Genet, vol. 10, pp. e1004496, 2014.
Abstract | Links | BibTeX | Tags: ancient DNA, assembly, evolution, fungi, RNA / transcriptomics, splicing
@article{Schwartze:14,
title = {Gene expansion shapes genome architecture in the human pathogen \textit{Lichtheimia corymbifera}: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina)},
author = {Volker U. Schwartze and Sascha Winter and Ekaterina Shelest and Marina Marcet-Houben and Fabian Horn and Stefanie Wehner and Jörg Linde and Vito Valiante and Michael Sammeth and Konstantin Riege and Minou Nowrousian and Kerstin Kaerger and Ilse D. Jacobsen and Manja Marz and Axel A. Brakhage and Toni Gabaldón and Sebastian Böcker and Kerstin Voigt},
doi = {10.1371/journal.pgen.1004496},
year = {2014},
date = {2014-08-14},
urldate = {2014-08-14},
journal = {PLos Genet},
volume = {10},
pages = {e1004496},
abstract = {Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera genome to R. oryzae: (i) the presence of an highly elevated number of gene duplications which are unlike R. oryzae not due to whole genome duplication (WGD), (ii) despite the relatively high incidence of introns, alternative splicing (AS) is not frequently observed for the generation of paralogs and in response to stress, (iii) the content of repetitive elements is strikingly low (<5%), (iv) L. corymbifera is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1-4 copies usually found in other fungi. More findings are: (i) lower content of tRNAs, but unique codons in L. corymbifera, (ii) Over 25% of the proteins are apparently specific for L. corymbifera. (iii) L. corymbifera contains only 2/3 of the proteases (known to be essential virulence factors) in comparison to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae.},
keywords = {ancient DNA, assembly, evolution, fungi, RNA / transcriptomics, splicing},
pubstate = {published},
tppubtype = {article}
}
Lechner, Marcus; Nickel, Astrid I.; Wehner, Stefanie; Riege, Konstantin; Wieseke, Nicolas; Beckmann, Benedikt M.; Hartmann, Roland K.; Marz, Manja
Genomewide comparison and novel ncRNAs of Aquificales Journal Article
In: BMC Genomics, vol. 15, pp. 522, 2014.
Abstract | Links | BibTeX | Tags: alignment, annotation, assembly, bacteria, classification, ncRNAs, phylogenetics
@article{Lechner:14,
title = {Genomewide comparison and novel ncRNAs of Aquificales},
author = {Marcus Lechner and Astrid I. Nickel and Stefanie Wehner and Konstantin Riege and Nicolas Wieseke and Benedikt M. Beckmann and Roland K. Hartmann and Manja Marz},
doi = {10.1186/1471-2164-15-522},
year = {2014},
date = {2014-06-25},
urldate = {2014-06-25},
journal = {BMC Genomics},
volume = {15},
pages = {522},
abstract = {The Aquificales are a diverse group of thermophilic bacteria that thrive in terrestrial and marine hydrothermal environments. They can be divided into the families Aquificaceae, Desulfurobacteriaceae and Hydrogenothermaceae. Although eleven fully sequenced and assembled genomes are available, only little is known about this taxonomic order in terms of RNA metabolism. In this work, we compare the available genomes, extend their protein annotation, identify regulatory sequences, annotate non-coding RNAs (ncRNAs) of known function, predict novel ncRNA candidates, show idiosyncrasies of the genetic decoding machinery, present two different types of transfer-messenger RNAs and variations of the CRISPR systems. Furthermore, we performed a phylogenetic analysis of the Aquificales based on entire genome sequences, and extended this by a classification among all bacteria using 16S rRNA sequences and a set of orthologous proteins.Combining several in silico features (e.g. conserved and stable secondary structures, GC-content, comparison based on multiple genome alignments) with an in vivo dRNA-seq transcriptome analysis of Aquifex aeolicus, we predict roughly 100 novel ncRNA candidates in this bacterium. We have here re-analyzed the Aquificales, a group of bacteria thriving in extreme environments, sharing the feature of a small, compact genome with a reduced number of protein and ncRNA genes. We present several classical ncRNAs and riboswitch candidates. By combining in silico analysis with dRNA-seq data of A. aeolicus we predict nearly 100 novel ncRNA candidates.},
keywords = {alignment, annotation, assembly, bacteria, classification, ncRNAs, phylogenetics},
pubstate = {published},
tppubtype = {article}
}
2013
Wehner, Stefanie; Dörrich, Anja K; Ciba, Philipp; Wilde, Annegret; Marz, Manja
pRNA: NoRC-associated RNA of rRNA operons Journal Article
In: RNA Biol, vol. 11, pp. 3–9, 2013.
Abstract | Links | BibTeX | Tags: alignment, assembly, ncRNAs, RNA / transcriptomics, RNA structure
@article{Wehner:14b,
title = {pRNA: NoRC-associated RNA of rRNA operons},
author = {Stefanie Wehner and Anja K Dörrich and Philipp Ciba and Annegret Wilde and Manja Marz},
doi = {10.4161/rna.27448},
year = {2013},
date = {2013-12-20},
urldate = {2013-12-20},
journal = {RNA Biol},
volume = {11},
pages = {3--9},
abstract = {Promoter-associated RNAs (pRNAs) are a family of ~90-100 nt-long divergent RNAs overlapping the promoter of the rRNA (rDNA) operon. pRNA transcripts interact with TIP5, a component of the chromatin remodeling complex NoRC, which recruits enzymes for heterochromatin formation and mediates silencing of rRNA genes. Here we present a comprehensive analysis of pRNA homologs, including different versions per species, as result of in silico studies in available metazoan genome assemblies. Comparative sequence analysis and secondary structure prediction ended up in two possible secondary structures, which let us assume a possible dual function of pRNAs for regulation of rRNA operons. Furthermore, we validated parts of our computational predictions experimentally by RT-PCR and sequencing. A representative seed alignment of the pRNA family, annotated with possible secondary structures was released to the Rfam database.},
keywords = {alignment, assembly, ncRNAs, RNA / transcriptomics, RNA structure},
pubstate = {published},
tppubtype = {article}
}
2010
Dalloul, Rami A.; Long, Julie A.; Zimin, Aleksey V.; Aslam, Luqman; Beal, Kathryn; Blomberg, Le Ann; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian Jake; Tassell, Curtis P. Van; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis Journal Article
In: PLoS Biol, vol. 8, 2010.
Abstract | Links | BibTeX | Tags: alignment, annotation, assembly, DNA / genomics, ncRNAs
@article{Dalloul:10,
title = {Multi-platform next-generation sequencing of the domestic turkey (\textit{Meleagris gallopavo}): genome assembly and analysis},
author = {Rami A. Dalloul and Julie A. Long and Aleksey V. Zimin and Luqman Aslam and Kathryn Beal and Le Ann Blomberg and Pascal Bouffard and David W. Burt and Oswald Crasta and Richard P. M. A. Crooijmans and Kristal Cooper and Roger A. Coulombe and Supriyo De and Mary E. Delany and Jerry B. Dodgson and Jennifer J. Dong and Clive Evans and Karin M. Frederickson and Paul Flicek and Liliana Florea and Otto Folkerts and Martien A. M. Groenen and Tim T. Harkins and Javier Herrero and Steve Hoffmann and Hendrik-Jan Megens and Andrew Jiang and Pieter Jong and Pete Kaiser and Heebal Kim and Kyu-Won Kim and Sungwon Kim and David Langenberger and Mi-Kyung Lee and Taeheon Lee and Shrinivasrao Mane and Guillaume Marcais and Manja Marz and Audrey P. McElroy and Thero Modise and Mikhail Nefedov and Cédric Notredame and Ian R. Paton and William S. Payne and Geo Pertea and Dennis Prickett and Daniela Puiu and Dan Qioa and Emanuele Raineri and Magali Ruffier and Steven L. Salzberg and Michael C. Schatz and Chantel Scheuring and Carl J. Schmidt and Steven Schroeder and Stephen M. J. Searle and Edward J. Smith and Jacqueline Smith and Tad S. Sonstegard and Peter F. Stadler and Hakim Tafer and Zhijian Jake Tu and Curtis P. Van Tassell and Albert J. Vilella and Kelly P. Williams and James A. Yorke and Liqing Zhang and Hong-Bin Zhang and Xiaojun Zhang and Yang Zhang and Kent M. Reed},
doi = {10.1371/journal.pbio.1000475},
year = {2010},
date = {2010-09-07},
urldate = {2010-09-07},
journal = {PLoS Biol},
volume = {8},
abstract = {A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.},
keywords = {alignment, annotation, assembly, DNA / genomics, ncRNAs},
pubstate = {published},
tppubtype = {article}
}
