Download
- Download from GitHub
Features
- Fast genome wide search
- Interactive graphical results
- Docker support
Download
- Download from GitHub
Reference
Collatz, Maximilian; Mock, Florian; Barth, Emanuel; Hölzer, Martin; Sachse, Konrad; Marz, Manja
EpiDope: A Deep Neural Network for linear B-cell epitope prediction Journal Article
In: Bioinformatics, vol. 37, no. 4, pp. 448–455, 2020.
@article{Collatz:20,
title = {EpiDope: A Deep Neural Network for linear B-cell epitope prediction},
author = {Maximilian Collatz and Florian Mock and Emanuel Barth and Martin Hölzer and Konrad Sachse and Manja Marz},
editor = {Lenore Cowen},
url = {https://github.com/rnajena/EpiDope},
doi = {10.1093/bioinformatics/btaa773},
year = {2020},
date = {2020-09-11},
urldate = {2020-09-11},
journal = {Bioinformatics},
volume = {37},
number = {4},
pages = {448–455},
publisher = {Oxford University Press (OUP)},
abstract = {By binding to specific structures on antigenic proteins, the so-called epitopes, B-cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great value for the development of specific serodiagnostic assays and the optimization of medical therapy. However, identifying diagnostically or therapeutically relevant epitopes is a challenging task that usually involves extensive laboratory work. In this study, we show that the time, cost and labor-intensive process of epitope detection in the lab can be significantly reduced using in silico prediction.
Here, we present EpiDope, a python tool which uses a deep neural network to detect linear B-cell epitope regions on individual protein sequences. With an area under the curve between 0.67 ± 0.07 in the receiver operating characteristic curve, EpiDope exceeds all other currently used linear B-cell epitope prediction tools. Our software is shown to reliably predict linear B-cell epitopes of a given protein sequence, thus contributing to a significant reduction of laboratory experiments and costs required for the conventional approach.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Here, we present EpiDope, a python tool which uses a deep neural network to detect linear B-cell epitope regions on individual protein sequences. With an area under the curve between 0.67 ± 0.07 in the receiver operating characteristic curve, EpiDope exceeds all other currently used linear B-cell epitope prediction tools. Our software is shown to reliably predict linear B-cell epitopes of a given protein sequence, thus contributing to a significant reduction of laboratory experiments and costs required for the conventional approach.
Features
- From input fasta to prediction of host in seconds.
Download
- Download from GitHub
Reference
Mock, Florian; Viehweger, Adrian; Barth, Emanuel; Marz, Manja
VIDHOP, viral host prediction with Deep Learning Journal Article
In: Bioinformatics, vol. 37, no. 3, pp. 318–325, 2020.
@article{Mock:20,
title = {VIDHOP, viral host prediction with Deep Learning},
author = {Florian Mock and Adrian Viehweger and Emanuel Barth and Manja Marz},
editor = {Jinbo Xu},
url = {https://github.com/rnajena/vidhop},
doi = {10.1093/bioinformatics/btaa705},
year = {2020},
date = {2020-08-10},
urldate = {2020-08-10},
journal = {Bioinformatics},
volume = {37},
number = {3},
pages = {318–325},
publisher = {Oxford University Press (OUP)},
abstract = {Zoonosis, the natural transmission of infections from animals to humans, is a far-reaching global problem. The recent outbreaks of Zikavirus, Ebolavirus and Coronavirus are examples of viral zoonosis, which occur more frequently due to globalization. In case of a virus outbreak, it is helpful to know which host organism was the original carrier of the virus to prevent further spreading of viral infection. Recent approaches aim to predict a viral host based on the viral genome, often in combination with the potential host genome and arbitrarily selected features. These methods are limited in the number of different hosts they can predict or the accuracy of the prediction.
Here, we present a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. We tested our deep neural network (DNN) on three different virus species (influenza A virus, rabies lyssavirus and rotavirus A). We achieved for each virus species an AUC between 0.93 and 0.98, allowing highly accurate predictions while using only fractions (100–400 bp) of the viral genome sequences. We show that deep neural networks are suitable to predict the host of a virus, even with a limited amount of sequences and highly unbalanced available data. The trained DNNs are the core of our virus–host prediction tool VIrus Deep learning HOst Prediction (VIDHOP). VIDHOP also allows the user to train and use models for other viruses.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Here, we present a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. We tested our deep neural network (DNN) on three different virus species (influenza A virus, rabies lyssavirus and rotavirus A). We achieved for each virus species an AUC between 0.93 and 0.98, allowing highly accurate predictions while using only fractions (100–400 bp) of the viral genome sequences. We show that deep neural networks are suitable to predict the host of a virus, even with a limited amount of sequences and highly unbalanced available data. The trained DNNs are the core of our virus–host prediction tool VIrus Deep learning HOst Prediction (VIDHOP). VIDHOP also allows the user to train and use models for other viruses.
RNAflow is a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus.
Download
- Download from GitHub
Reference
Lataretu, Marie; Hölzer, Martin
RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow Journal Article
In: Genes, vol. 11, no. 12, pp. 1487, 2020.
@article{Lataretu:20,
title = {RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow},
author = {Marie Lataretu and Martin Hölzer},
url = {https://github.com/hoelzer-lab/rnaflow},
doi = {10.3390/genes11121487},
year = {2020},
date = {2020-12-10},
urldate = {2020-01-01},
journal = {Genes},
volume = {11},
number = {12},
pages = {1487},
publisher = {MDPI AG},
abstract = {RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Download
- Download from GitHub
Reference
Desiro, Daniel; Hölzer, Martin; Ibrahim, Bashar; Marz, Manja
SilentMutations (SIM): a tool for analyzing long-range RNA-RNA interactions in viral genomes and structured RNAs Journal Article
In: Virus Res, vol. 260, pp. 135-141, 2018.
@article{Desiro:18,
title = {SilentMutations (SIM): a tool for analyzing long-range RNA-RNA interactions in viral genomes and structured RNAs},
author = {Daniel Desiro and Martin Hölzer and Bashar Ibrahim and Manja Marz},
url = {https://github.com/desiro/silentMutations},
doi = {10.1016/j.virusres.2018.11.005},
year = {2018},
date = {2018-11-12},
urldate = {2018-11-12},
journal = {Virus Res},
volume = {260},
pages = {135-141},
abstract = {A single nucleotide change in the coding region can alter the amino acid sequence of a protein. In consequence, natural or artificial sequence changes in viral RNAs may have various effects not only on protein stability, function and structure but also on viral replication. In recent decades, several tools have been developed to predict the effect of mutations in structured RNAs such as viral genomes or non-coding RNAs. Some tools use multiple point mutations and also take coding regions into account. However, none of these tools was designed to specifically simulate the effect of mutations on viral long-range interactions. Here, we developed SilentMutations (SIM), an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. The tool can simulate disruptive and compensatory mutants of two interacting single-stranded RNAs. This allows a fast and accurate assessment of key regions potentially involved in functional long-range RNA-RNA interactions and will eventually help virologists and RNA-experts to design appropriate experiments. SIM only requires two interacting single-stranded RNA regions as input. The output is a plain text file containing the most promising mutants and a graphical representation of all interactions. We applied our tool on two experimentally validated influenza A virus and hepatitis C virus interactions and we were able to predict potential double mutants for in vitro validation experiments. The source code and documentation of SIM are freely available at github.com/desiro/silentMutations.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Features & Facts
- Input: FASTA file(s)
- Uses in-house filters (e.g. phylogeny based) and TPM/FPKM computation from BAM files
- Offers RNome based phylogeny reconstruction
- Root less installation for Linux, Unix
- Requirements: internet, gcc, wget, Perl, make
Download
- Easy–installer including all necessary software, libraries and up to date databases (Rfam, NCBI Taxonomy, Silva)
- Source @GitHub
Features & Facts
- Input: nucleotide coding sequences as one multiple FASTA file
- assigns unique ID that can be used to access all data when calculations are finished
Download
- GitHub page
- PoSeiDon now runs w/ Nextflow and Docker:
nextflow run hoelzer/poseidon --help
Reference
Hölzer, Martin; Marz, Manja
PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection Journal Article
In: Bioinformatics, vol. 37, no. 7, pp. 1018-1020, 2020.
@article{Hoelzer:20a,
title = {PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection},
author = {Martin Hölzer and Manja Marz},
editor = {Alfonso Valencia},
url = {https://github.com/rnajena/poseidon},
doi = {10.1093/bioinformatics/btaa695},
year = {2020},
date = {2020-07-31},
urldate = {2020-07-31},
journal = {Bioinformatics},
volume = {37},
number = {7},
pages = {1018-1020},
publisher = {Oxford University Press (OUP)},
abstract = {PoSeiDon is an easy-to-use pipeline that helps researchers to find recombination events and sites under positive selection in protein-coding sequences. By entering homologous sequences, PoSeiDon builds an alignment, estimates a best-fitting substitution model and performs a recombination analysis followed by the construction of all corresponding phylogenies. Finally, significantly positive selected sites are detected according to different models for the full alignment and possible recombination fragments. The results of PoSeiDon are summarized in a user-friendly HTML page providing all intermediate results and the graphical representation of recombination events and positively selected sites.
},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Features & Facts
- read count normalization
- download annotations and GO terms for your genes
- tool to find gene variance cut-off for PCA
Download
- Go to web server
- Source code @GitHub
- Run PCAGO as standalone desktop application using the Electron framework
Reference
Gerst, Ruman; Hölzer, Martin
PCAGO: An interactive web service to analyze RNA-Seq data with principal component analysis Journal Article
In: bioRxiv, pp. 433078, 2018.
@article{Gerst:18,
title = {PCAGO: An interactive web service to analyze RNA-Seq data with principal component analysis},
author = {Ruman Gerst and Martin Hölzer},
url = {https://github.com/rnajena/pcago-unified},
doi = {10.1101/433078},
year = {2018},
date = {2018-10-03},
urldate = {2018-10-03},
journal = {bioRxiv},
pages = {433078},
publisher = {Cold Spring Harbor Laboratory},
abstract = {The initial characterization and clustering of biological samples is a critical step in the analysis of any transcriptomics study. In many studies, principal component analysis (PCA) is the clustering algorithm of choice to predict the relationship of samples or cells based solely on differential gene expression. In addition to the pure quality evaluation of the data, a PCA can also provide initial insights into the biological background of an experiment and help researchers to interpret the data and design the subsequent computational steps accordingly. However, to avoid misleading clusterings and interpretations, an appropriate selection of the underlying gene sets to build the PCA and the choice of the most fitting principal components for the visualization are crucial parts. Here, we present PCAGO, an easy-to-use and interactive tool to analyze gene quantification data derived from RNA sequencing experiments with PCA. The tool includes features such as read-count normalization, filtering of read counts by gene annotation, and various visualization options. In addition, PCAGO helps to select appropriate parameters such as the number of genes and principal components to create meaningful visualizations.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Download
Reference
Fricke, Markus; Marz, Manja
Prediction of conserved long-range RNA-RNA interactions in full viral genomes Journal Article
In: Bioinformatics, vol. 32, no. 19, pp. 2928–2935, 2016.
@article{Fricke:16,
title = {Prediction of conserved long-range RNA-RNA interactions in full viral genomes},
author = {Markus Fricke and Manja Marz},
url = {http://www.rna.uni-jena.de/en/supplements/lriscan/},
doi = {10.1093/bioinformatics/btw323},
year = {2016},
date = {2016-06-10},
urldate = {2016-06-10},
journal = {Bioinformatics},
volume = {32},
number = {19},
pages = {2928--2935},
abstract = {Long-range RNA-RNA interactions (LRIs) play an important role in viral replication, however, only a few of these interactions are known and only for a small number of viral species. Up to now, it has been impossible to screen a full viral genome for LRIs experimentally or in silico Most known LRIs are cross-reacting structures (pseudoknots) undetectable by most bioinformatical tools. We present LRIscan, a tool for the LRI prediction in full viral genomes based on a multiple genome alignment. We confirmed 14 out of 16 experimentally known and evolutionary conserved LRIs in genome alignments of HCV, Tombusviruses, Flaviviruses and HIV-1. We provide several promising new interactions, which include compensatory mutations and are highly conserved in all considered viral sequences. Furthermore, we provide reactivity plots highlighting the hot spots of predicted LRIs. Source code and binaries of LRIscan freely available for download at http://www.rna.uni-jena.de/en/supplements/lriscan/, implemented in Ruby/C ++ and supported on Linux and Windows. manja@uni-jena.de Supplementary data are available at Bioinformatics online.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Features & Facts
- new ORF density method to identify viruses without any sequence homology to known references
- tested on real datasets generated with different sequencing technologies
Download
Features & Facts
- Handles thousands of input constraints
- Plots all results with gnuplot
- Optimized for multi-cores
- Runtime complexity: O(n log n)
Download
References
Qin, Jing; Fricke, Markus; Marz, Manja; Stadler, Peter F; Backofen, Rolf
Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures Journal Article
In: Algorithms Mol Biol, vol. 9, pp. 19, 2014.
@article{Qin:14,
title = {Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures},
author = {Jing Qin and Markus Fricke and Manja Marz and Peter F Stadler and Rolf Backofen},
url = {http://www.rna.uni-jena.de/RNAgraphdist.html},
doi = {10.1186/1748-7188-9-19},
year = {2014},
date = {2014-09-11},
urldate = {2014-09-11},
journal = {Algorithms Mol Biol},
volume = {9},
pages = {19},
abstract = {Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n (6) D (5)) for sequence length n and D distinct distance values, it is possible to reduce this to O(n (4)) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Features & Facts
- Based on the whole set of all known bacterial orthologous genes and their syntenic information determined by Proteinortho
Download
References
Wieseke, Nicolas; Lechner, Marcus; Ludwig, Marcus; Marz, Manja
POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria Proceedings Article
In: Cai, Zhipeng; Eulenstein, Oliver; Janies, Daniel; Schwartz, Daniel (Ed.): Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA 2013), Charlotte, NC, USA, May 20-22, 2013., pp. pp 249-260, Springer, 2013.
@inproceedings{Wieseke:13,
title = {POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria},
author = {Nicolas Wieseke and Marcus Lechner and Marcus Ludwig and Manja Marz},
editor = {Zhipeng Cai and Oliver Eulenstein and Daniel Janies and Daniel Schwartz},
url = {http://www.rna.uni-jena.de/supplements/pomago},
doi = {10.1007/978-3-642-38036-5_25},
year = {2013},
date = {2013-01-01},
urldate = {2013-01-01},
booktitle = {Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA 2013), Charlotte, NC, USA, May 20-22, 2013.},
volume = {7875},
number = {1},
pages = {pp 249-260},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
abstract = {Multiple Genome-wide Alignments are a first crucial step to compare genomes. Gain and loss of genes, duplications and genomic rearrangements are challenging problems that aggravate with increasing phylogenetic distances. We describe a multiple genome-wide alignment tool for bacteria, called POMAGO, which is based on orthologous genes and their syntenic information determined by Proteinortho.This strategy enables POMAGO to efficiently define anchor points even across wide phylogenetic distances and outperform existing approaches in this field of application. The given set of orthologous genes is enhanced by several cleaning and completion steps, including the addition of previously undetected orthologous genes. Protein-coding genes are aligned on nucleotide and protein level, whereas intergenic regions are aligned on nucleotide level only. We tested and compared our program at three very different sets of bacteria that exhibit different degrees of phylogenetic distances: 1) 15 closely related, well examined and described E. coli species, 2) six more divergent Aquificales, as putative basal bacteria, and 3) a set of eight extreme divergent species, distributed among the whole phylogenetic tree of bacteria. POMAGO is written in a modular way which allows extending or even exchanging algorithms in different stages of the alignment process. Intergenic regions might for instance be aligned using an RNA secondary structure aware algorithm rather than to rely on sequence data alone. The software is freely available from
},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Features & Facts
- Small memory footprint
- Optimized for multi-core and cluster environments
- Runtime complexity: O(n2)
Download
References
Lechner, Marcus; Findeiss, Sven; Steiner, Lydia; Marz, Manja; Stadler, Peter F; Prohaska, Sonja J
Proteinortho: detection of (co-)orthologs in large-scale analysis Journal Article
In: BMC Bioinf, vol. 12, pp. 124, 2011.
@article{Lechner:11,
title = {Proteinortho: detection of (co-)orthologs in large-scale analysis},
author = {Marcus Lechner and Sven Findeiss and Lydia Steiner and Manja Marz and Peter F Stadler and Sonja J Prohaska},
url = {http://bioinf.pharmazie.uni-marburg.de/supplements/proteinortho/},
doi = {10.1186/1471-2105-12-124},
year = {2011},
date = {2011-04-28},
urldate = {2011-04-28},
journal = {BMC Bioinf},
volume = {12},
pages = {124},
abstract = {Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Features & Facts
- Small constant memory footprint
- Handles hundreds of petabytes if necessary
- Runtime complexity: O(n)