VrAP is based on the genome assembler SPAdes combined with a additional read correction and several filter steps. Our pipeline classifies the contigs to distinguish host from viral sequences by annotation and ORF density scores. With our new ORF density method we can identify viruses without any sequence homology to known references. We tested VrAP on real datasets generated with different sequencing technologies. We identified new viruses representing new species and even genera and families.

Download vrap.tar.gz


extract vrap.tar.gz

Dependency list:

  • Python3
  • bio-python

The pipeline expects a linux 64bit version. Otherwise the included tools need to be compiled again.

It is recommended to set a global PATH variable: PATH=$PATH:/global/path/to/vrap/main/directory/

For help page: ./vrap.py -h

The simplest run is: vrap.py -o output/dir -1 pair_1.fq

or if you have paired end reads:

vrap.py -o output/dir -1 pair_1.fq -2 pair_2.fq

VrAP only accepts fastq files or packed fastq files. I also recommend to use the -o option to set a output directory.

If you have a genome of your host you can add the fasta file for host read depletion with -r.

vrap.py -o output/dir -1 pair_1.fq -2 pair_1.fq -r reference.fasta

If you want to extend the viral search database you can add a fasta file with -v. However, the pipeline already includes a comprehensive viral database which will be downloaded at the first run.

vrap.py -o output/dir -1 pair_1.fq -2 pair_1.fq -v viral_db.fasta

If you want to update the included viral database you can use the -u option. vrap.py -u

You can also add other search databases as blast database with -n (nucleotide db e.g. nt) or -a (protein db, e.g. nr).

vrap.py -o output/dir -1 pair_1.fq -2 pair_1.fq -n nt_blast_db -a aa_blast_db


VrAP will output a vrap_contig.fasta with the assembled contigs and a vrap_summary.csv with possible annotations for all contigs.


qleng = contig length

orf_d = orf density

ident = sequence identity to target/reference

qcov = contig coverage by blast target/reference

tcov = target/reference coverage by contig

tlength = target/reference length

tid = target/reference id

tname = target/reference name

mean_eval = mean Blast e-value of all hits to the corresponding target/reference