
E-Mail: shahram.saghaei@uni-jena.de
Room: 4008
Phone: +49-3641-9-46486
Publications
2025
Saghaei, Shahram; Siemers, Malte; Ossetek, Kilian L; Richter, Stephan; Edwards, Robert A; Roux, Simon; Zielezinski, Andrzej; Dutilh, Bas E; Marz, Manja; Cassman, Noriko A
VirJenDB: a FAIR (meta)data and bioinformatics platform for all viruses Journal Article
In: Nucleic Acids Research, 2025.
@article{nokey_97,
title = {VirJenDB: a FAIR (meta)data and bioinformatics platform for all viruses},
author = {Shahram Saghaei and Malte Siemers and Kilian L Ossetek and Stephan Richter and Robert A Edwards and Simon Roux and Andrzej Zielezinski and Bas E Dutilh and Manja Marz and Noriko A Cassman},
doi = {10.1093/nar/gkaf1224},
year = {2025},
date = {2025-12-17},
journal = {Nucleic Acids Research},
abstract = {High-throughput sequencing has generated an unprecedented volume of data. However, researcher-submitted data in repositories requires extensive curation and quality control for reuse. These tasks are hindered by the multiplicity of repositories, the sheer volume of the data, and the complexity of virus (meta)data curation. To address these challenges, VirJenDB offers a user-friendly platform to facilitate versioned, community-driven curation, and ontology development. Virus sequences were ingested from 16 sources, including ~200 fields of metadata or standards, covering taxonomy, sample, and host information. Up to 85 metadata fields have undergone at least one round of curation, and are linked to 15.4 million virus sequences, with 88 % from those infecting eukaryotes and the remaining infecting prokaryotes. Subsets were created, including a novel collection of 0.91 million viral operational taxonomic unit (vOTU) sequences across all viruses, while keeping the original sequences from each vOTU to facilitate downstream analyses, e.g. sequence variation. The VirJenDB web portal (https://www.virjendb.org) provides HTTPS and Application Programming Interface (API) access to the sequence datasets and metadata, offering a search engine, filtering, download, visualizations, and documentation. VirJenDB aims to connect the phage and eukaryotic virus research communities by supporting webtool integration, meta-analyses, and metadata schema extensions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
High-throughput sequencing has generated an unprecedented volume of data. However, researcher-submitted data in repositories requires extensive curation and quality control for reuse. These tasks are hindered by the multiplicity of repositories, the sheer volume of the data, and the complexity of virus (meta)data curation. To address these challenges, VirJenDB offers a user-friendly platform to facilitate versioned, community-driven curation, and ontology development. Virus sequences were ingested from 16 sources, including ~200 fields of metadata or standards, covering taxonomy, sample, and host information. Up to 85 metadata fields have undergone at least one round of curation, and are linked to 15.4 million virus sequences, with 88 % from those infecting eukaryotes and the remaining infecting prokaryotes. Subsets were created, including a novel collection of 0.91 million viral operational taxonomic unit (vOTU) sequences across all viruses, while keeping the original sequences from each vOTU to facilitate downstream analyses, e.g. sequence variation. The VirJenDB web portal (https://www.virjendb.org) provides HTTPS and Application Programming Interface (API) access to the sequence datasets and metadata, offering a search engine, filtering, download, visualizations, and documentation. VirJenDB aims to connect the phage and eukaryotic virus research communities by supporting webtool integration, meta-analyses, and metadata schema extensions.
2023
Ritsch, Muriel; Cassman, Noriko A.; Saghaei, Shahram; Marz, Manja
Navigating the Landscape: A Comprehensive Review of Current Virus Databases Journal Article
In: Viruses, vol. 15, iss. 9, no. 1834, 2023, ISBN: 1999-4915.
@article{nokey_43,
title = {Navigating the Landscape: A Comprehensive Review of Current Virus Databases},
author = {Muriel Ritsch and Noriko A. Cassman and Shahram Saghaei and Manja Marz},
doi = {10.3390/v15091834},
isbn = {1999-4915},
year = {2023},
date = {2023-08-29},
journal = {Viruses},
volume = {15},
number = {1834},
issue = {9},
abstract = {Viruses are abundant and diverse entities that have important roles in public health, ecology, and agriculture. The identification and surveillance of viruses rely on an understanding of their genome organization, sequences, and replication strategy. Despite technological advancements in sequencing methods, our current understanding of virus diversity remains incomplete, highlighting the need to explore undiscovered viruses. Virus databases play a crucial role in providing access to sequences, annotations and other metadata, and analysis tools for studying viruses. However, there has not been a comprehensive review of virus databases in the last five years. This study aimed to fill this gap by identifying 24 active virus databases and included an extensive evaluation of their content, functionality and compliance with the FAIR principles. In this study, we thoroughly assessed the search capabilities of five database catalogs, which serve as comprehensive repositories housing a diverse array of databases and offering essential metadata. Moreover, we conducted a comprehensive review of different types of errors, encompassing taxonomy, names, missing information, sequences, sequence orientation, and chimeric sequences, with the intention of empowering users to effectively tackle these challenges. We expect this review to aid users in selecting suitable virus databases and other resources, and to help databases in error management and improve their adherence to the FAIR principles. The databases listed here represent the current knowledge of viruses and will help aid users find databases of interest based on content, functionality, and scope. The use of virus databases is integral to gaining new insights into the biology, evolution, and transmission of viruses, and developing new strategies to manage virus outbreaks and preserve global health.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Viruses are abundant and diverse entities that have important roles in public health, ecology, and agriculture. The identification and surveillance of viruses rely on an understanding of their genome organization, sequences, and replication strategy. Despite technological advancements in sequencing methods, our current understanding of virus diversity remains incomplete, highlighting the need to explore undiscovered viruses. Virus databases play a crucial role in providing access to sequences, annotations and other metadata, and analysis tools for studying viruses. However, there has not been a comprehensive review of virus databases in the last five years. This study aimed to fill this gap by identifying 24 active virus databases and included an extensive evaluation of their content, functionality and compliance with the FAIR principles. In this study, we thoroughly assessed the search capabilities of five database catalogs, which serve as comprehensive repositories housing a diverse array of databases and offering essential metadata. Moreover, we conducted a comprehensive review of different types of errors, encompassing taxonomy, names, missing information, sequences, sequence orientation, and chimeric sequences, with the intention of empowering users to effectively tackle these challenges. We expect this review to aid users in selecting suitable virus databases and other resources, and to help databases in error management and improve their adherence to the FAIR principles. The databases listed here represent the current knowledge of viruses and will help aid users find databases of interest based on content, functionality, and scope. The use of virus databases is integral to gaining new insights into the biology, evolution, and transmission of viruses, and developing new strategies to manage virus outbreaks and preserve global health.
