EpiDope: Prediction of B-cell epitopes from amino acid sequences
  • Fast genome wide search
  • Interactive graphical results
  • Docker support
  • [DOI] M. Collatz, F. Mock, E. Barth, M. Hölzer, K. Sachse, and M. Marz, “EpiDope: a deep neural network for linear B-cell epitope prediction,” Bioinformatics, 2020.
    author = {Maximilian Collatz and Florian Mock and Emanuel Barth and Martin Hölzer and Konrad Sachse and Manja Marz},
    journal = {Bioinformatics},
    title = {{EpiDope}: A Deep Neural Network for linear {B}-cell epitope prediction},
    year = {2020},
    doi = {10.1093/bioinformatics/btaa773},
    editor = {Lenore Cowen},
    publisher = {Oxford University Press ({OUP})},
VIDHOP: virus host prediction

VIDHOP is a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. VIDHOP allows highly accurate predictions while using only fractions (100–400 bp) of the viral genome sequences. VIDHOP also allows the user to train and use models for other viruses.

  • From input fasta to prediction of host in seconds.
  • [DOI] F. Mock, A. Viehweger, E. Barth, and M. Marz, “VIDHOP, viral host prediction with deep learning,” Bioinformatics, 2020.
    author = {Florian Mock and Adrian Viehweger and Emanuel Barth and Manja Marz},
    title = {{VIDHOP}, viral host prediction with Deep Learning},
    journal = {Bioinformatics},
    year = {2020},
    doi = {10.1093/bioinformatics/btaa705},
    editor = {Jinbo Xu},
    publisher = {Oxford University Press ({OUP})},
RNAflow: Simple RNA-Seq differential gene expression pipeline using Nextflow
RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies.

RNAflow is a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus.

  • [DOI] M. Lataretu and M. Hölzer, “RNAflow: an effective and simple RNA-seq differential gene expression pipeline using Nextflow,” Genes, vol. 11, iss. 12, p. 1487, 2020.
    author = {Marie Lataretu and Martin Hölzer},
    title = {{RNAflow}: An Effective and Simple {RNA}-Seq Differential Gene Expression Pipeline Using {N}extflow},
    journal = {Genes},
    year = {2020},
    volume = {11},
    number = {12},
    pages = {1487},
    doi = {10.3390/genes11121487},
    publisher = {{MDPI} {AG}},
SIM: SilentMutations
SilentMutations (SIM) can analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. It simulates destructive and compensatory mutants of two key regions from a single-stranded RNA, which can then be utilized for the combinatorial in vitro analysis of RNA-RNA interactions.

  • [DOI] D. Desiro, M. Hölzer, B. Ibrahim, and M. Marz, “SilentMutations (SIM): a tool for analyzing long-range RNA-RNA interactions in viral genomes and structured RNAs,” Virus Res, 2018.
    author = {Desiro, Daniel and H\"{o}lzer, Martin and Ibrahim, Bashar and Marz, Manja},
    title = {{SilentMutations} ({SIM}): a tool for analyzing long-range {RNA-RNA} interactions in viral genomes and structured {RNA}s},
    journal = {{Virus Res}},
    year = {2018},
    abstract = {A single nucleotide change in the coding region can alter the amino acid sequence of a protein. In consequence, natural or artificial sequence changes in viral RNAs may have various effects not only on protein stability, function and structure but also on viral replication. In recent decades, several tools have been developed to predict the effect of mutations in structured RNAs such as viral genomes or non-coding RNAs. Some tools use multiple point mutations and also take coding regions into account. However, none of these tools was designed to specifically simulate the effect of mutations on viral long-range interactions. Here, we developed SilentMutations (SIM), an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. The tool can simulate disruptive and compensatory mutants of two interacting single-stranded RNAs. This allows a fast and accurate assessment of key regions potentially involved in functional long-range RNA-RNA interactions and will eventually help virologists and RNA-experts to design appropriate experiments. SIM only requires two interacting single-stranded RNA regions as input. The output is a plain text file containing the most promising mutants and a graphical representation of all interactions. We applied our tool on two experimentally validated influenza A virus and hepatitis C virus interactions and we were able to predict potential double mutants for in vitro validation experiments. The source code and documentation of SIM are freely available at github.com/desiro/silentMutations.},
    doi = {10.1016/j.virusres.2018.11.005},
    keywords = {Codon Mutation; Double-mutant; RNA secondary structure; RNA virus; Virology; Virus Bioinformatics; silent mutation},
    pmid = {30439394},
GORAP: Genomewide ncRNA Annotation Pipeline
GORAP is a pipeline for automated non-coding RNA annotation based on BioPerl, Infernal, Blast, RNAmmer, tRNAscan, Bcheck, RAxML, CRT, Mafft, Samtools.

Features & Facts
  • Input: FASTA file(s)
  • Uses in-house filters (e.g. phylogeny based) and TPM/FPKM computation from BAM files
  • Offers RNome based phylogeny reconstruction
  • Root less installation for Linux, Unix
  • Requirements: internet, gcc, wget, Perl, make
  • Easy–installer including all necessary software, libraries and up to date databases (Rfam, NCBI Taxonomy, Silva)
  • Source @GitHub
PoSeiDon: Positive Selection Detection and Recombination Analysis
PoSeiDon is an easy-to-use pipeline to detect significant positively selected sites and possible recombination events in an alignment of multiple coding sequences.

Features & Facts
  • Input: nucleotide coding sequences as one multiple FASTA file
  • assigns unique ID that can be used to access all data when calculations are finished
  • GitHub page
  • PoSeiDon now runs w/ Nextflow and Docker: nextflow run hoelzer/poseidon --help
PCAGO: Principal component analysis for RNA-Seq read counts
PCAGO is an interactive web service that helps you analyze your RNA-Seq read counts with principal component analysis (PCA) and clustering.

Features & Facts
  • read count normalization
  • download annotations and GO terms for your genes
  • tool to find gene variance cut-off for PCA
  • [DOI] R. Gerst and M. Hölzer, “PCAGO: an interactive web service to analyze RNA-seq data with principal component analysis,” bioRxiv, p. 433078, 2018.
    author = {Ruman Gerst and Martin H\"{o}lzer},
    title = {{PCAGO}: An interactive web service to analyze {RNA}-Seq data with principal component analysis},
    journal = {{bioRxiv}},
    year = {2018},
    pages = {433078},
    doi = {10.1101/433078},
    publisher = {Cold Spring Harbor Laboratory},
LRIscan: Long range RNA-RNA interactions
LRIscan is no longer maintained.
LRIscan is a tool to predict conserved, genome-wide long range RNA-RNA interactions based on a multiple sequence alignment in only a few hours on an average computer.

  • [DOI] M. Fricke and M. Marz, “Prediction of conserved long-range RNA-RNA interactions in full viral genomes,” Bioinformatics, vol. 32, p. 2928–2935, 2016.
    author = {Fricke, Markus and Marz, Manja},
    title = {Prediction of conserved long-range {RNA-RNA} interactions in full viral genomes},
    journal = {Bioinformatics},
    year = {2016},
    volume = {32},
    pages = {2928--2935},
    abstract = {Long-range RNA-RNA interactions (LRIs) play an important role in viral replication, however, only a few of these interactions are known and only for a small number of viral species. Up to now, it has been impossible to screen a full viral genome for LRIs experimentally or in silico Most known LRIs are cross-reacting structures (pseudoknots) undetectable by most bioinformatical tools. We present LRIscan, a tool for the LRI prediction in full viral genomes based on a multiple genome alignment. We confirmed 14 out of 16 experimentally known and evolutionary conserved LRIs in genome alignments of HCV, Tombusviruses, Flaviviruses and HIV-1. We provide several promising new interactions, which include compensatory mutations and are highly conserved in all considered viral sequences. Furthermore, we provide reactivity plots highlighting the hot spots of predicted LRIs. Source code and binaries of LRIscan freely available for download at http://www.rna.uni-jena.de/en/supplements/lriscan/, implemented in Ruby/C ++ and supported on Linux and Windows. manja@uni-jena.de Supplementary data are available at Bioinformatics online.},
    doi = {10.1093/bioinformatics/btw323},
    issue = {19},
    keywords = {Computer Simulation; Genome, Viral; RNA, Viral; Sequence Analysis, RNA; Software},
    pmid = {27288498},
VrAP: Viral Assembly Pipeline
VrAP is no longer maintained.
VrAP is a viral assembly pipeline based on the genome assembler SPAdes combined with an additional read correction and several filter steps. VrAP classifies the contigs to distinguish host from viral sequences by annotation and ORF density scores.

Features & Facts
  • new ORF density method to identify viruses without any sequence homology to known references
  • tested on real datasets generated with different sequencing technologies


RNAgraphdist: Graph distance between to bases
RNAgraphdist is no longer maintained.
RNAgraphdist finds the shortest graph distance between to bases i and j.

Features & Facts
  • Handles thousands of input constraints
  • Plots all results with gnuplot
  • Optimized for multi-cores
  • Runtime complexity: O(n log n)
  • [DOI] J. Qin, M. Fricke, M. Marz, P. F. Stadler, and R. Backofen, “Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures,” Algorithms Mol Biol, vol. 9, p. 19, 2014.
    author = {Qin, Jing and Fricke, Markus and Marz, Manja and Stadler, Peter F and Backofen, Rolf},
    title = {Graph-distance distribution of the {B}oltzmann ensemble of {RNA} secondary structures},
    journal = {{Algorithms Mol Biol}},
    year = {2014},
    volume = {9},
    pages = {19},
    abstract = {Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n (6) D (5)) for sequence length n and D distinct distance values, it is possible to reduce this to O(n (4)) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html.},
    doi = {10.1186/1748-7188-9-19},
    keywords = {Boltzmann distribution; Graph-distance; Partition function; Pre-mRNA splicing; smFRET},
    pmid = {25285153},
  • R. Backofen, M. Fricke, M. Marz, J. Qin, and P. F. Stadler, “Distribution of graph-distances in Boltzmann ensembles of RNA secondary structures,” in International Workshop on Algorithms in Bioinformatics, 2013, p. 112–125.
    author = {Backofen, Rolf and Fricke, Markus and Marz, Manja and Qin, Jing and Stadler, Peter F},
    title = {Distribution of graph-distances in {B}oltzmann ensembles of {RNA} secondary structures},
    booktitle = {{International Workshop on Algorithms in Bioinformatics}},
    year = {2013},
    pages = {112--125},
POMAGO: Multiple Genome Aligner
POMAGO is no longer maintained.
POMAGO is a multiple genome aligner designed for, but not limited to, bacterial genomes.

Features & Facts
  • Based on the whole set of all known bacterial orthologous genes and their syntenic information determined by Proteinortho
Proteinortho: Orthology detection tool
Proteinortho is no longer maintained.
Proteinortho is a tool to detect orthologous proteins across hundreds of species.

Features & Facts
  • Small memory footprint
  • Optimized for multi-core and cluster environments
  • Runtime complexity: O(n2)
Galculator: Nucleotide counter for fasta files
Galculator is no longer maintained.
Galculator is a nucleotide counter for fasta files that counts mononucleotide frequencies, dinucleotide frequencies, and gapped dinucleotide frequencies (XnY).

Features & Facts
  • Small constant memory footprint
  • Handles hundreds of petabytes if necessary
  • Runtime complexity: O(n)