2025
Eulenfeld, Tom; Triebel, Sandra; Marz, Manja
AnchoRNA: Full virus genome alignments through conserved anchor regions Journal Article
In: bioRxiv, 2025.
Abstract | Links | BibTeX | Tags: alignment, phylogenetics, software, viruses
@article{nokey_67,
title = {AnchoRNA: Full virus genome alignments through conserved anchor regions},
author = {Tom Eulenfeld and Sandra Triebel and Manja Marz},
doi = {10.1101/2025.01.30.635689},
year = {2025},
date = {2025-12-15},
urldate = {2025-02-01},
journal = {bioRxiv},
abstract = {Multiple sequence alignment of full viral genomes can be challenging due to factors such as long sequences, large insertions/deletions (spanning several 100 nucleotides), large number of sequences, sequence divergence, and high computational complexity in particular when computing alignments based on RNA secondary structures. Standard alignment methods often face these issues, in particular when processing highly variable sequences or when specific phylogenetic analysis is required on selected subsequences.
We present an algorithm to determine high quality anchors that define partitions of sequences and guide the alignment of viral genomes to respect well conserved, and therefore functionally significant, regions. This new approach is implemented in the Python-based command line tool AnchoRNA, which is designed to identify conserved regions, or anchors, within coding sequences. By default, anchors are searched in translated coding sequences accounting for high mutation rates in viral genomes. AnchoRNA enhances the accuracy and efficiency of full-genome alignment by focusing on these crucial conserved regions. AnchoRNA guided alignments are systematically compared to the results of 3 alignment programs. Utilizing a dataset of 55 representative Pestivirus genomes, AnchoRNA identified 55 anchors that are used for guiding the alignment process. The incorporation of these anchors led to improvements across tested alignment tools, highlighting the effectiveness of AnchoRNA in enhancing alignment quality, especially in viral genomes.},
keywords = {alignment, phylogenetics, software, viruses},
pubstate = {published},
tppubtype = {article}
}
We present an algorithm to determine high quality anchors that define partitions of sequences and guide the alignment of viral genomes to respect well conserved, and therefore functionally significant, regions. This new approach is implemented in the Python-based command line tool AnchoRNA, which is designed to identify conserved regions, or anchors, within coding sequences. By default, anchors are searched in translated coding sequences accounting for high mutation rates in viral genomes. AnchoRNA enhances the accuracy and efficiency of full-genome alignment by focusing on these crucial conserved regions. AnchoRNA guided alignments are systematically compared to the results of 3 alignment programs. Utilizing a dataset of 55 representative Pestivirus genomes, AnchoRNA identified 55 anchors that are used for guiding the alignment process. The incorporation of these anchors led to improvements across tested alignment tools, highlighting the effectiveness of AnchoRNA in enhancing alignment quality, especially in viral genomes.
Triebel, Sandra; Eulenfeld, Tom; Ontiveros-Palacios, Nancy; Sweeney, Blake; Tautz, Norbert; Marz, Manja
First full-genome alignment representative for the genus Pestivirus Journal Article
In: bioRxiv, 2025.
Abstract | Links | BibTeX | Tags: alignment, evolution, phylogenetics, RNA structure, RNA-RNA interactions, viruses
@article{nokey_77,
title = {First full-genome alignment representative for the genus \textit{Pestivirus}},
author = {Sandra Triebel and Tom Eulenfeld and Nancy Ontiveros-Palacios and Blake Sweeney and Norbert Tautz and Manja Marz},
url = {https://doi.org/10.5281/zenodo.15490752},
doi = {10.1101/2025.05.22.655560},
year = {2025},
date = {2025-05-27},
journal = {bioRxiv},
abstract = {The members of the genus Pestivirus in the family Flaviviridae comprise economically important pathogens of life stock like classical swine fever (CSFV) and bovine viral diarrhea virus (BVDV). Intense research over the last years revealed that at least 11 recognized and eight proposed pestivirus species exist. The single-stranded, positive-sense RNA genome encodes for one large polyprotein which is processed by viral and cell-derived proteases into 12 mature proteins. Besides its protein-coding function, the RNA genome also contains RNA secondary structures with critical importance for various stages of the viral life cycle. Some of those RNA secondary structures, like the internal ribosome entry site (IRES) and a 3’ stem-loop essential for genome replication, had already been studied for a few individual pestiviruses.
In this study, we provide the first genome-wide multiple sequence alignment (MSA) including all known pestivirus species (accepted and tentative). Moreover, we performed a comprehensive analysis of RNA secondary structures phylogenetically conserved across the complete genus. While showing well-described structures, like a 5’ stem-loop structure, the IRES element, and the 3’ stem loop SL I to be conserved between all pestiviruses, other RNA secondary structures in the 3’ untranslated region (UTR) were only conserved in subsets of the species. We identified 29 novel phylogenetically conserved RNA secondary structures in the protein-coding region, with so far unresolved functional importance. The microRNA binding site for miR-17 was previously known in species A, B, and C; in this study, we identified it in ten additional species, but not in species K, S, Q, and R. Another interesting finding is the identification of a putative long-distance RNA interaction between the IRES and the 3’ end of the genome. These results together with the now available comprehensive multiple sequence alignment including all 19 pestivirus species, represent a valuable resource for future research and diagnostic purposes.},
keywords = {alignment, evolution, phylogenetics, RNA structure, RNA-RNA interactions, viruses},
pubstate = {published},
tppubtype = {article}
}
In this study, we provide the first genome-wide multiple sequence alignment (MSA) including all known pestivirus species (accepted and tentative). Moreover, we performed a comprehensive analysis of RNA secondary structures phylogenetically conserved across the complete genus. While showing well-described structures, like a 5’ stem-loop structure, the IRES element, and the 3’ stem loop SL I to be conserved between all pestiviruses, other RNA secondary structures in the 3’ untranslated region (UTR) were only conserved in subsets of the species. We identified 29 novel phylogenetically conserved RNA secondary structures in the protein-coding region, with so far unresolved functional importance. The microRNA binding site for miR-17 was previously known in species A, B, and C; in this study, we identified it in ten additional species, but not in species K, S, Q, and R. Another interesting finding is the identification of a putative long-distance RNA interaction between the IRES and the 3’ end of the genome. These results together with the now available comprehensive multiple sequence alignment including all 19 pestivirus species, represent a valuable resource for future research and diagnostic purposes.
Collatz, Maximilian; Braun, Sascha D.; Reinicke, Martin; Müller, Elke; Monecke, Stefan; Ehricht, Ralf
AssayBLAST: A Bioinformatic Tool for In Silico Analysis of Molecular Multiparameter Assays Journal Article
In: Applied Biosciences, vol. 4, 2025.
Abstract | Links | BibTeX | Tags: alignment, DNA / genomics, software
@article{nokey_89,
title = {AssayBLAST: A Bioinformatic Tool for In Silico Analysis of Molecular Multiparameter Assays},
author = {Maximilian Collatz and Sascha D. Braun and Martin Reinicke and Elke Müller and Stefan Monecke and Ralf Ehricht},
doi = {10.3390/applbiosci4020018},
year = {2025},
date = {2025-04-01},
journal = {Applied Biosciences},
volume = {4},
abstract = {Accurate primer and probe design is essential for molecular applications, including PCR, qPCR, and molecular multiparameter assays like microarrays. The novel software tool AssayBLAST addresses this need by simulating interactions between oligonucleotides and target sequences. AssayBLAST handles large sets of primer and probe sequences simultaneously and supports comprehensive assay designs by allowing users to identify off-target binding, calculate melting temperatures, and ensure strand specificity, a critical but often overlooked aspect. AssayBLAST performs two optimized BLAST-based searches for each primer or probe sequence, checking the forward and reverse strands for off-target interactions and strand-specific binding accuracy. The results are compiled into a mapping table containing binding sites, mismatches, and strand orientation, allowing users to validate large sets of oligonucleotides across predefined custom databases for a complete and optimal theoretical assay design. AssayBLAST was evaluated against experimental Staphylococcus aureus microarray data, achieving 97.5% accuracy in predicting probe–target hybridization outcomes. This high accuracy demonstrates the method’s effectiveness in reliably using BLAST hits and mismatch counts to predict microarray results. AssayBLAST provides a reliable, scalable solution for in silico primer and probe validation, effectively supporting large-scale assay designs and optimizations. Its accurate prediction of hybridization outcomes demonstrates its utility in enhancing the efficiency and reliability of molecular assays.},
keywords = {alignment, DNA / genomics, software},
pubstate = {published},
tppubtype = {article}
}
2023
Sachse, Konrad; Hölzer, Martin; Vorimore, Fabien; Barf, Lisa-Marie; Sachse, Carsten; Laroucau, Karine; Marz, Manja; Lamkiewicz, Kevin
Genomic analysis of 61 Chlamydia psittaci strains reveals extensive divergence associated with host preference Journal Article
In: BMC Genomics, vol. 24, iss. 1, pp. 288, 2023, ISBN: 1471-2164.
Abstract | Links | BibTeX | Tags: alignment, assembly, bacteria, DNA / genomics, phylogenetics
@article{nokey_35,
title = {Genomic analysis of 61 \textit{Chlamydia psittaci} strains reveals extensive divergence associated with host preference},
author = {Konrad Sachse and Martin Hölzer and Fabien Vorimore and Lisa-Marie Barf and Carsten Sachse and Karine Laroucau and Manja Marz and Kevin Lamkiewicz },
doi = {10.1186/s12864-023-09370-w},
isbn = {1471-2164},
year = {2023},
date = {2023-05-29},
urldate = {2023-05-29},
journal = {BMC Genomics},
volume = {24},
issue = {1},
pages = {288},
abstract = {Background
Chlamydia (C.) psittaci, the causative agent of avian chlamydiosis and human psittacosis, is a genetically heterogeneous species. Its broad host range includes parrots and many other birds, but occasionally also humans (via zoonotic transmission), ruminants, horses, swine and rodents. To assess whether there are genetic markers associated with host tropism we comparatively analyzed whole-genome sequences of 61 C. psittaci strains, 47 of which carrying a 7.6-kbp plasmid.
Results
Following clean-up, reassembly and polishing of poorly assembled genomes from public databases, phylogenetic analyses using C. psittaci whole-genome sequence alignment revealed four major clades within this species. Clade 1 represents the most recent lineage comprising 40/61 strains and contains 9/10 of the psittacine strains, including type strain 6BC, and 10/13 of human isolates. Strains from different non-psittacine hosts clustered in Clades 2– 4. We found that clade membership correlates with typing schemes based on SNP types, ompA genotypes, multilocus sequence types as well as plasticity zone (PZ) structure and host preference. Genome analysis also revealed that i) sequence variation in the major outer membrane porin MOMP can result in 3D structural changes of immunogenic domains, ii) past host change of Clade 3 and 4 strains could be associated with loss of MAC/perforin in the PZ, rather than the large cytotoxin, iii) the distinct phylogeny of atypical strains (Clades 3 and 4) is also reflected in their repertoire of inclusion proteins (Inc family) and polymorphic membrane proteins (Pmps).
Conclusions
Our study identified a number of genomic features that can be correlated with the phylogeny and host preference of C. psittaci strains. Our data show that intra-species genomic divergence is associated with past host change and includes deletions in the plasticity zone, structural variations in immunogenic domains and distinct repertoires of virulence factors.},
keywords = {alignment, assembly, bacteria, DNA / genomics, phylogenetics},
pubstate = {published},
tppubtype = {article}
}
Chlamydia (C.) psittaci, the causative agent of avian chlamydiosis and human psittacosis, is a genetically heterogeneous species. Its broad host range includes parrots and many other birds, but occasionally also humans (via zoonotic transmission), ruminants, horses, swine and rodents. To assess whether there are genetic markers associated with host tropism we comparatively analyzed whole-genome sequences of 61 C. psittaci strains, 47 of which carrying a 7.6-kbp plasmid.
Results
Following clean-up, reassembly and polishing of poorly assembled genomes from public databases, phylogenetic analyses using C. psittaci whole-genome sequence alignment revealed four major clades within this species. Clade 1 represents the most recent lineage comprising 40/61 strains and contains 9/10 of the psittacine strains, including type strain 6BC, and 10/13 of human isolates. Strains from different non-psittacine hosts clustered in Clades 2– 4. We found that clade membership correlates with typing schemes based on SNP types, ompA genotypes, multilocus sequence types as well as plasticity zone (PZ) structure and host preference. Genome analysis also revealed that i) sequence variation in the major outer membrane porin MOMP can result in 3D structural changes of immunogenic domains, ii) past host change of Clade 3 and 4 strains could be associated with loss of MAC/perforin in the PZ, rather than the large cytotoxin, iii) the distinct phylogeny of atypical strains (Clades 3 and 4) is also reflected in their repertoire of inclusion proteins (Inc family) and polymorphic membrane proteins (Pmps).
Conclusions
Our study identified a number of genomic features that can be correlated with the phylogeny and host preference of C. psittaci strains. Our data show that intra-species genomic divergence is associated with past host change and includes deletions in the plasticity zone, structural variations in immunogenic domains and distinct repertoires of virulence factors.
2022
Collatz, Maximilian; Braun, Sascha D.; Monecke, Stefan; Ehricht, Ralf
ConsensusPrime—A Bioinformatic Pipeline for Ideal Consensus Primer Design Journal Article
In: BioMedInformatics, vol. 2, 2022.
Abstract | Links | BibTeX | Tags: alignment, DNA / genomics, software
@article{nokey_91,
title = {ConsensusPrime—A Bioinformatic Pipeline for Ideal Consensus Primer Design},
author = {Maximilian Collatz and Sascha D. Braun and Stefan Monecke and Ralf Ehricht},
doi = {10.3390/biomedinformatics2040041},
year = {2022},
date = {2022-11-24},
urldate = {2022-11-24},
journal = {BioMedInformatics},
volume = {2},
abstract = {Background: High-quality oligonucleotides for molecular amplification and detection procedures of diverse target sequences depend on sequence homology. Processing input sequences and identifying homogeneous regions in alignments can be carried out by hand only if they are small and contain sequences of high similarity. Finding the best regions for large and inhomogeneous alignments needs to be automated.
Results: The ConsensusPrime pipeline was developed to sort out redundant and technical interfering data in multiple sequence alignments and detect the most homologous regions from multiple sequences. It automates the prediction of optimal consensus primers for molecular analytical and sequence-based procedures/assays.
Conclusion: ConsensusPrime is a fast and easy-to-use pipeline for predicting optimal consensus primers that is executable on local systems without depending on external resources and web services. An implementation in a Docker image ensures platform-independent executability and installability despite the combination of multiple programs. The source code and installation instructions are publicly available on GitHub.},
keywords = {alignment, DNA / genomics, software},
pubstate = {published},
tppubtype = {article}
}
Results: The ConsensusPrime pipeline was developed to sort out redundant and technical interfering data in multiple sequence alignments and detect the most homologous regions from multiple sequences. It automates the prediction of optimal consensus primers for molecular analytical and sequence-based procedures/assays.
Conclusion: ConsensusPrime is a fast and easy-to-use pipeline for predicting optimal consensus primers that is executable on local systems without depending on external resources and web services. An implementation in a Docker image ensures platform-independent executability and installability despite the combination of multiple programs. The source code and installation instructions are publicly available on GitHub.
2020
Kalvari, Ioanna; Nawrocki, Eric P; Ontiveros-Palacios, Nancy; Argasinska, Joanna; Lamkiewicz, Kevin; Marz, Manja; Griffiths-Jones, Sam; Toffano-Nioche, Claire; Gautheret, Daniel; Weinberg, Zasha; Rivas, Elena; Eddy, Sean R; Finn, Robert D; Bateman, Alex; Petrov, Anton I
Rfam 14: expanded coverage of metagenomic, viral and microRNA families Journal Article
In: Nucleic Acids Res, vol. 49, no. D1, pp. D192–D200, 2020.
Abstract | Links | BibTeX | Tags: alignment, annotation, bacteria, coronavirus, database, metagenomics, ncRNAs, RNA / transcriptomics, software, viruses
@article{Kalvari:21,
title = {Rfam 14: expanded coverage of metagenomic, viral and microRNA families},
author = {Ioanna Kalvari and Eric P Nawrocki and Nancy Ontiveros-Palacios and Joanna Argasinska and Kevin Lamkiewicz and Manja Marz and Sam Griffiths-Jones and Claire Toffano-Nioche and Daniel Gautheret and Zasha Weinberg and Elena Rivas and Sean R Eddy and Robert D Finn and Alex Bateman and Anton I Petrov},
url = {https://rfam.org/},
doi = {10.1093/nar/gkaa1047},
year = {2020},
date = {2020-11-19},
urldate = {2020-11-19},
journal = {Nucleic Acids Res},
volume = {49},
number = {D1},
pages = {D192--D200},
publisher = {Oxford University Press (OUP)},
abstract = {Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.},
keywords = {alignment, annotation, bacteria, coronavirus, database, metagenomics, ncRNAs, RNA / transcriptomics, software, viruses},
pubstate = {published},
tppubtype = {article}
}
Hölzer, Martin; Marz, Manja
PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection Journal Article
In: Bioinformatics, vol. 37, no. 7, pp. 1018-1020, 2020.
Abstract | Links | BibTeX | Tags: alignment, evolution, phylogenetics, software
@article{Hoelzer:20a,
title = {PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection},
author = {Martin Hölzer and Manja Marz},
editor = {Alfonso Valencia},
url = {https://github.com/rnajena/poseidon},
doi = {10.1093/bioinformatics/btaa695},
year = {2020},
date = {2020-07-31},
urldate = {2020-07-31},
journal = {Bioinformatics},
volume = {37},
number = {7},
pages = {1018-1020},
publisher = {Oxford University Press (OUP)},
abstract = {PoSeiDon is an easy-to-use pipeline that helps researchers to find recombination events and sites under positive selection in protein-coding sequences. By entering homologous sequences, PoSeiDon builds an alignment, estimates a best-fitting substitution model and performs a recombination analysis followed by the construction of all corresponding phylogenies. Finally, significantly positive selected sites are detected according to different models for the full alignment and possible recombination fragments. The results of PoSeiDon are summarized in a user-friendly HTML page providing all intermediate results and the graphical representation of recombination events and positively selected sites.
},
keywords = {alignment, evolution, phylogenetics, software},
pubstate = {published},
tppubtype = {article}
}
2016
Fricke, Markus; Marz, Manja
Prediction of conserved long-range RNA-RNA interactions in full viral genomes Journal Article
In: Bioinformatics, vol. 32, no. 19, pp. 2928–2935, 2016.
Abstract | Links | BibTeX | Tags: alignment, RNA / transcriptomics, RNA structure, RNA-RNA interactions, software, viruses
@article{Fricke:16,
title = {Prediction of conserved long-range RNA-RNA interactions in full viral genomes},
author = {Markus Fricke and Manja Marz},
url = {http://www.rna.uni-jena.de/en/supplements/lriscan/},
doi = {10.1093/bioinformatics/btw323},
year = {2016},
date = {2016-06-10},
urldate = {2016-06-10},
journal = {Bioinformatics},
volume = {32},
number = {19},
pages = {2928--2935},
abstract = {Long-range RNA-RNA interactions (LRIs) play an important role in viral replication, however, only a few of these interactions are known and only for a small number of viral species. Up to now, it has been impossible to screen a full viral genome for LRIs experimentally or in silico Most known LRIs are cross-reacting structures (pseudoknots) undetectable by most bioinformatical tools. We present LRIscan, a tool for the LRI prediction in full viral genomes based on a multiple genome alignment. We confirmed 14 out of 16 experimentally known and evolutionary conserved LRIs in genome alignments of HCV, Tombusviruses, Flaviviruses and HIV-1. We provide several promising new interactions, which include compensatory mutations and are highly conserved in all considered viral sequences. Furthermore, we provide reactivity plots highlighting the hot spots of predicted LRIs. Source code and binaries of LRIscan freely available for download at http://www.rna.uni-jena.de/en/supplements/lriscan/, implemented in Ruby/C ++ and supported on Linux and Windows. manja@uni-jena.de Supplementary data are available at Bioinformatics online.},
keywords = {alignment, RNA / transcriptomics, RNA structure, RNA-RNA interactions, software, viruses},
pubstate = {published},
tppubtype = {article}
}
2015
Fricke, Markus; Dünnes, Nadia; Zayas, Margarita; Bartenschlager, Ralf; Niepmann, Michael; Marz, Manja
Conserved RNA secondary structures and long-range interactions in hepatitis C viruses Journal Article
In: RNA, vol. 21, pp. 1219–1232, 2015.
Abstract | Links | BibTeX | Tags: alignment, RNA / transcriptomics, RNA structure, RNA-RNA interactions, viruses
@article{Fricke:15,
title = {Conserved RNA secondary structures and long-range interactions in hepatitis C viruses},
author = {Markus Fricke and Nadia Dünnes and Margarita Zayas and Ralf Bartenschlager and Michael Niepmann and Manja Marz},
doi = {10.1261/rna.049338.114},
year = {2015},
date = {2015-05-11},
urldate = {2015-05-11},
journal = {RNA},
volume = {21},
pages = {1219--1232},
abstract = {Hepatitis C virus (HCV) is a hepatotropic virus with a plus-strand RNA genome of ∼9.600 nt. Due to error-prone replication by its RNA-dependent RNA polymerase (RdRp) residing in nonstructural protein 5B (NS5B), HCV isolates are grouped into seven genotypes with several subtypes. By using whole-genome sequences of 106 HCV isolates and secondary structure alignments of the plus-strand genome and its minus-strand replication intermediate, we established refined secondary structures of the 5' untranslated region (UTR), the cis-acting replication element (CRE) in NS5B, and the 3' UTR. We propose an alternative structure in the 5' UTR, conserved secondary structures of 5B stem-loop (SL)1 and 5BSL2, and four possible structures of the X-tail at the very 3' end of the HCV genome. We predict several previously unknown long-range interactions, most importantly a possible circularization interaction between distinct elements in the 5' and 3' UTR, reminiscent of the cyclization elements of the related flaviviruses. Based on analogy to these viruses, we propose that the 5'-3' UTR base-pairing in the HCV genome might play an important role in viral RNA replication. These results may have important implications for our understanding of the nature of the cis-acting RNA elements in the HCV genome and their possible role in regulating the mutually exclusive processes of viral RNA translation and replication.},
keywords = {alignment, RNA / transcriptomics, RNA structure, RNA-RNA interactions, viruses},
pubstate = {published},
tppubtype = {article}
}
2014
Lechner, Marcus; Nickel, Astrid I.; Wehner, Stefanie; Riege, Konstantin; Wieseke, Nicolas; Beckmann, Benedikt M.; Hartmann, Roland K.; Marz, Manja
Genomewide comparison and novel ncRNAs of Aquificales Journal Article
In: BMC Genomics, vol. 15, pp. 522, 2014.
Abstract | Links | BibTeX | Tags: alignment, annotation, assembly, bacteria, classification, ncRNAs, phylogenetics
@article{Lechner:14,
title = {Genomewide comparison and novel ncRNAs of Aquificales},
author = {Marcus Lechner and Astrid I. Nickel and Stefanie Wehner and Konstantin Riege and Nicolas Wieseke and Benedikt M. Beckmann and Roland K. Hartmann and Manja Marz},
doi = {10.1186/1471-2164-15-522},
year = {2014},
date = {2014-06-25},
urldate = {2014-06-25},
journal = {BMC Genomics},
volume = {15},
pages = {522},
abstract = {The Aquificales are a diverse group of thermophilic bacteria that thrive in terrestrial and marine hydrothermal environments. They can be divided into the families Aquificaceae, Desulfurobacteriaceae and Hydrogenothermaceae. Although eleven fully sequenced and assembled genomes are available, only little is known about this taxonomic order in terms of RNA metabolism. In this work, we compare the available genomes, extend their protein annotation, identify regulatory sequences, annotate non-coding RNAs (ncRNAs) of known function, predict novel ncRNA candidates, show idiosyncrasies of the genetic decoding machinery, present two different types of transfer-messenger RNAs and variations of the CRISPR systems. Furthermore, we performed a phylogenetic analysis of the Aquificales based on entire genome sequences, and extended this by a classification among all bacteria using 16S rRNA sequences and a set of orthologous proteins.Combining several in silico features (e.g. conserved and stable secondary structures, GC-content, comparison based on multiple genome alignments) with an in vivo dRNA-seq transcriptome analysis of Aquifex aeolicus, we predict roughly 100 novel ncRNA candidates in this bacterium. We have here re-analyzed the Aquificales, a group of bacteria thriving in extreme environments, sharing the feature of a small, compact genome with a reduced number of protein and ncRNA genes. We present several classical ncRNAs and riboswitch candidates. By combining in silico analysis with dRNA-seq data of A. aeolicus we predict nearly 100 novel ncRNA candidates.},
keywords = {alignment, annotation, assembly, bacteria, classification, ncRNAs, phylogenetics},
pubstate = {published},
tppubtype = {article}
}
2013
Wehner, Stefanie; Dörrich, Anja K; Ciba, Philipp; Wilde, Annegret; Marz, Manja
pRNA: NoRC-associated RNA of rRNA operons Journal Article
In: RNA Biol, vol. 11, pp. 3–9, 2013.
Abstract | Links | BibTeX | Tags: alignment, assembly, ncRNAs, RNA / transcriptomics, RNA structure
@article{Wehner:14b,
title = {pRNA: NoRC-associated RNA of rRNA operons},
author = {Stefanie Wehner and Anja K Dörrich and Philipp Ciba and Annegret Wilde and Manja Marz},
doi = {10.4161/rna.27448},
year = {2013},
date = {2013-12-20},
urldate = {2013-12-20},
journal = {RNA Biol},
volume = {11},
pages = {3--9},
abstract = {Promoter-associated RNAs (pRNAs) are a family of ~90-100 nt-long divergent RNAs overlapping the promoter of the rRNA (rDNA) operon. pRNA transcripts interact with TIP5, a component of the chromatin remodeling complex NoRC, which recruits enzymes for heterochromatin formation and mediates silencing of rRNA genes. Here we present a comprehensive analysis of pRNA homologs, including different versions per species, as result of in silico studies in available metazoan genome assemblies. Comparative sequence analysis and secondary structure prediction ended up in two possible secondary structures, which let us assume a possible dual function of pRNAs for regulation of rRNA operons. Furthermore, we validated parts of our computational predictions experimentally by RT-PCR and sequencing. A representative seed alignment of the pRNA family, annotated with possible secondary structures was released to the Rfam database.},
keywords = {alignment, assembly, ncRNAs, RNA / transcriptomics, RNA structure},
pubstate = {published},
tppubtype = {article}
}
Wieseke, Nicolas; Lechner, Marcus; Ludwig, Marcus; Marz, Manja
POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria Proceedings Article
In: Cai, Zhipeng; Eulenstein, Oliver; Janies, Daniel; Schwartz, Daniel (Ed.): Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA 2013), Charlotte, NC, USA, May 20-22, 2013., pp. pp 249-260, Springer, 2013.
Abstract | Links | BibTeX | Tags: alignment, bacteria, phylogenetics, RNA structure, software
@inproceedings{Wieseke:13,
title = {POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria},
author = {Nicolas Wieseke and Marcus Lechner and Marcus Ludwig and Manja Marz},
editor = {Zhipeng Cai and Oliver Eulenstein and Daniel Janies and Daniel Schwartz},
url = {http://www.rna.uni-jena.de/supplements/pomago},
doi = {10.1007/978-3-642-38036-5_25},
year = {2013},
date = {2013-01-01},
urldate = {2013-01-01},
booktitle = {Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA 2013), Charlotte, NC, USA, May 20-22, 2013.},
volume = {7875},
number = {1},
pages = {pp 249-260},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
abstract = {Multiple Genome-wide Alignments are a first crucial step to compare genomes. Gain and loss of genes, duplications and genomic rearrangements are challenging problems that aggravate with increasing phylogenetic distances. We describe a multiple genome-wide alignment tool for bacteria, called POMAGO, which is based on orthologous genes and their syntenic information determined by Proteinortho.This strategy enables POMAGO to efficiently define anchor points even across wide phylogenetic distances and outperform existing approaches in this field of application. The given set of orthologous genes is enhanced by several cleaning and completion steps, including the addition of previously undetected orthologous genes. Protein-coding genes are aligned on nucleotide and protein level, whereas intergenic regions are aligned on nucleotide level only. We tested and compared our program at three very different sets of bacteria that exhibit different degrees of phylogenetic distances: 1) 15 closely related, well examined and described E. coli species, 2) six more divergent Aquificales, as putative basal bacteria, and 3) a set of eight extreme divergent species, distributed among the whole phylogenetic tree of bacteria. POMAGO is written in a modular way which allows extending or even exchanging algorithms in different stages of the alignment process. Intergenic regions might for instance be aligned using an RNA secondary structure aware algorithm rather than to rely on sequence data alone. The software is freely available from
},
keywords = {alignment, bacteria, phylogenetics, RNA structure, software},
pubstate = {published},
tppubtype = {inproceedings}
}
2011
Marz, Manja; Gruber, Andreas R.; zu Siederdissen, Christian Höner; Amman, Fabian; Badelt, Stefan; Bartschat, Sebastian; Bernhart, Stephan H.; Beyer, Wolfgang; Kehr, Stephanie; Lorenz, Ronny; Tanzer, Andrea; Yusuf, Dilmurat; Tafer, Hakim; Hofacker, Ivo L.; Stadler, Peter F.
Animal snoRNAs and scaRNAs with exceptional structures Journal Article
In: RNA Biol, vol. 8, pp. 938–946, 2011.
Abstract | Links | BibTeX | Tags: alignment, ncRNAs, RNA / transcriptomics, RNA structure
@article{Marz:11,
title = {Animal snoRNAs and scaRNAs with exceptional structures},
author = {Manja Marz and Andreas R. Gruber and Christian {Höner zu Siederdissen} and Fabian Amman and Stefan Badelt and Sebastian Bartschat and Stephan H. Bernhart and Wolfgang Beyer and Stephanie Kehr and Ronny Lorenz and Andrea Tanzer and Dilmurat Yusuf and Hakim Tafer and Ivo L. Hofacker and Peter F. Stadler},
doi = {10.4161/rna.8.6.16603},
year = {2011},
date = {2011-11-01},
urldate = {2011-11-01},
journal = {RNA Biol},
volume = {8},
pages = {938--946},
abstract = {The overwhelming majority of small nucleolar RNAs (snoRNAs) fall into two clearly defined classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but differs substantially from the norm in some respects. Here, we compile the available information on these exceptional cases, conduct a thorough homology search throughout the available metazoan genomes, provide improved and expanded alignments, and investigate the evolutionary histories of these ncRNA families as well as their mutual relationships.},
keywords = {alignment, ncRNAs, RNA / transcriptomics, RNA structure},
pubstate = {published},
tppubtype = {article}
}
Lechner, Marcus; Findeiss, Sven; Steiner, Lydia; Marz, Manja; Stadler, Peter F; Prohaska, Sonja J
Proteinortho: detection of (co-)orthologs in large-scale analysis Journal Article
In: BMC Bioinf, vol. 12, pp. 124, 2011.
Abstract | Links | BibTeX | Tags: alignment, bacteria, phylogenetics, proteins, software
@article{Lechner:11,
title = {Proteinortho: detection of (co-)orthologs in large-scale analysis},
author = {Marcus Lechner and Sven Findeiss and Lydia Steiner and Manja Marz and Peter F Stadler and Sonja J Prohaska},
url = {http://bioinf.pharmazie.uni-marburg.de/supplements/proteinortho/},
doi = {10.1186/1471-2105-12-124},
year = {2011},
date = {2011-04-28},
urldate = {2011-04-28},
journal = {BMC Bioinf},
volume = {12},
pages = {124},
abstract = {Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.},
keywords = {alignment, bacteria, phylogenetics, proteins, software},
pubstate = {published},
tppubtype = {article}
}
Li, Andrew X; Marz, Manja; Qin, Jing; Reidys, Christian M
RNA-RNA interaction prediction based on multiple sequence alignments Journal Article
In: Bioinformatics, vol. 27, pp. 456–463, 2011.
Abstract | Links | BibTeX | Tags: alignment, evolution, RNA structure, RNA-RNA interactions
@article{Li:11,
title = {RNA-RNA interaction prediction based on multiple sequence alignments},
author = {Andrew X Li and Manja Marz and Jing Qin and Christian M Reidys},
url = {http://www.combinatorics.cn/cbpc/ripalign.html},
doi = {10.1093/bioinformatics/btq659},
year = {2011},
date = {2011-01-01},
urldate = {2011-01-01},
journal = {Bioinformatics},
volume = {27},
pages = {456--463},
abstract = {Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N(6)) time and O(N(4)) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both: thermodynamic stability as well as sequence/structure covariation. We present the a priori folding algorithm ripalign, whose input consists of two (given) multiple sequence alignments (MSA). ripalign outputs (i) the partition function, (ii) base pairing probabilities, (iii) hybrid probabilities and (iv) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm rip, ripalign requires negligible additional memory resource but offers much better sensitivity and specificity, once alignments of suitable quality are given. ripalign additionally allows to incorporate structure constraints as input parameters. The algorithm described here is implemented in C as part of the rip package.},
keywords = {alignment, evolution, RNA structure, RNA-RNA interactions},
pubstate = {published},
tppubtype = {article}
}
2010
Dalloul, Rami A.; Long, Julie A.; Zimin, Aleksey V.; Aslam, Luqman; Beal, Kathryn; Blomberg, Le Ann; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian Jake; Tassell, Curtis P. Van; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis Journal Article
In: PLoS Biol, vol. 8, 2010.
Abstract | Links | BibTeX | Tags: alignment, annotation, assembly, DNA / genomics, ncRNAs
@article{Dalloul:10,
title = {Multi-platform next-generation sequencing of the domestic turkey (\textit{Meleagris gallopavo}): genome assembly and analysis},
author = {Rami A. Dalloul and Julie A. Long and Aleksey V. Zimin and Luqman Aslam and Kathryn Beal and Le Ann Blomberg and Pascal Bouffard and David W. Burt and Oswald Crasta and Richard P. M. A. Crooijmans and Kristal Cooper and Roger A. Coulombe and Supriyo De and Mary E. Delany and Jerry B. Dodgson and Jennifer J. Dong and Clive Evans and Karin M. Frederickson and Paul Flicek and Liliana Florea and Otto Folkerts and Martien A. M. Groenen and Tim T. Harkins and Javier Herrero and Steve Hoffmann and Hendrik-Jan Megens and Andrew Jiang and Pieter Jong and Pete Kaiser and Heebal Kim and Kyu-Won Kim and Sungwon Kim and David Langenberger and Mi-Kyung Lee and Taeheon Lee and Shrinivasrao Mane and Guillaume Marcais and Manja Marz and Audrey P. McElroy and Thero Modise and Mikhail Nefedov and Cédric Notredame and Ian R. Paton and William S. Payne and Geo Pertea and Dennis Prickett and Daniela Puiu and Dan Qioa and Emanuele Raineri and Magali Ruffier and Steven L. Salzberg and Michael C. Schatz and Chantel Scheuring and Carl J. Schmidt and Steven Schroeder and Stephen M. J. Searle and Edward J. Smith and Jacqueline Smith and Tad S. Sonstegard and Peter F. Stadler and Hakim Tafer and Zhijian Jake Tu and Curtis P. Van Tassell and Albert J. Vilella and Kelly P. Williams and James A. Yorke and Liqing Zhang and Hong-Bin Zhang and Xiaojun Zhang and Yang Zhang and Kent M. Reed},
doi = {10.1371/journal.pbio.1000475},
year = {2010},
date = {2010-09-07},
urldate = {2010-09-07},
journal = {PLoS Biol},
volume = {8},
abstract = {A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.},
keywords = {alignment, annotation, assembly, DNA / genomics, ncRNAs},
pubstate = {published},
tppubtype = {article}
}
2009
Jones, Thomas A; Otto, Wolfgang; Marz, Manja; Eddy, Sean R; Stadler, Peter F
A survey of nematode SmY RNAs Journal Article
In: RNA Biol, vol. 6, pp. 5–8, 2009.
Abstract | Links | BibTeX | Tags: alignment, ncRNAs, RNA / transcriptomics, RNA structure, splicing
@article{Jones:09,
title = {A survey of nematode SmY RNAs},
author = {Thomas A Jones and Wolfgang Otto and Manja Marz and Sean R Eddy and Peter F Stadler},
doi = {10.4161/rna.6.1.7634},
year = {2009},
date = {2009-01-01},
urldate = {2009-01-01},
journal = {RNA Biol},
volume = {6},
pages = {5--8},
abstract = {SmY RNAs are a family of approximately 70-90 nt small nuclear RNAs found in nematodes. In C. elegans, SmY RNAs copurify in a small ribonucleoprotein (snRNP) complex related to the SL1 and SL2 snRNPs that are involved in nematode mRNA trans-splicing. Here we describe a comprehensive computational analysis of SmY RNA homologs found in the currently available genome sequences. We identify homologs in all sequenced nematode genomes in class Chromadorea. We are unable to identify homologs in a more distantly related nematode species, Trichinella spiralis (class: Dorylaimia), and in representatives of non-nematode phyla that use trans-splicing. Using comparative RNA sequence analysis, we infer a conserved consensus SmY RNA secondary structure consisting of two stems flanking a consensus Sm protein binding site. A representative seed alignment of the SmY RNA family, annotated with the inferred consensus secondary structure, has been deposited with the Rfam RNA families database.},
keywords = {alignment, ncRNAs, RNA / transcriptomics, RNA structure, splicing},
pubstate = {published},
tppubtype = {article}
}
Ingalls, Todd; Martius, Georg; Marz, Manja; Prohaska, Sonja J.
Converting DNA to music: ComposAlign Proceedings Article
In: Proceedings of the German Conference on Bioinformatics (GCB 2009), pp. 93-104, 2009.
Abstract | Links | BibTeX | Tags: alignment
@inproceedings{Ingalls:09,
title = {Converting DNA to music: ComposAlign},
author = {Todd Ingalls and Georg Martius and Manja Marz and Sonja J. Prohaska},
url = {https://dl.gi.de/handle/20.500.12116/20313},
year = {2009},
date = {2009-01-01},
urldate = {2009-01-01},
booktitle = {Proceedings of the German Conference on Bioinformatics (GCB 2009)},
volume = {P-157},
pages = {93-104},
series = {GI Lecture Notes in Informatics},
abstract = {Alignments are part of the most important data type in the field of comparative genomics. They can be abstracted to a character matrix derived from aligned sequences. A variety of biological questions forces the researcher to inspect these alignments. Our tool, called COMPOSALIGN, was developed to sonify large scale genomic data. The resulting musical composition is based on COMMON MUSIC and allows the mapping of genes to motifs and species to instruments. It enables the researcher to listen to the musical representation of the genome-wide alignment and contrasts a bioinformatician's sight-oriented work at the computer.},
keywords = {alignment},
pubstate = {published},
tppubtype = {inproceedings}
}
