Viruses in groundwater
The principle aim of the CRC AquaDiva is to increase our understanding of the links between surface and subsurface, especially how organisms inhabiting the subsurface critical zone reflect and influence their physical, ecological and geochemical environment. In project A06, we will determine the diversity of previously known viruses in groundwater by high-throughput sequencing of viral genomes. A challenge in virology is the identification of previously undetected viruses, which we will tackle with our new approach to virus assembly. We complement the approach by investigating the differences in metatranscriptomes of the different sampling sites. Finally, we will determine the broadly unknown virus decay in groundwater to gain insights into communication pauses with other organisms.
Grants: CRC 1076 — AquaDiva: A6: Viral Diversity, Viral de novo Assembly, and Viral Decay in Groundwater
In: Environ Microbiome, vol. 16, no. 1, pp. 24, 2021.
In: Environ Microbiol, vol. 22, no. 9, pp. 4000-4013, 2020.
In: Viruses, vol. 11, no. 6, pp. 484, 2019.
In: Appl Environ Microbiol, vol. 85, no. 5, pp. e02346-18, 2019.
In: J Proteomics, vol. 152, pp. 153–160, 2016.
Ecology and species barriers in emerging viral diseases
Emerging viruses existing in animal reservoirs may cause epidemic or epizootic diseases if transmitted to humans or livestock. While we understand the pathogenicity and epidemiology of prototypic emerging viral diseases, we know little about the mechanisms driving virus emergence from animal reservoirs. To move ahead, we need to generalize our view on emerging viruses, taking into consideration the ecology of viruses in their natural reservoirs. We hypothesize that small mammals, mainly bats, and rodents, constitute the most relevant virus reservoirs due to their large group sizes, population density, mixing, and turnover, as well as their exposure to arthropod vectors.
Grants: DFG SPP 1596: Ecology and Species Barriers in Emerging Viral Diseases
In: NAR Genomics Bioinf, vol. 2, no. 1, pp. lqz006, 2019.
In: Sci Rep, vol. 6, pp. 34589, 2016.
Limiting viral outbreaks with Machine Learning
Zoonosis, the natural transmission of infections from animal to human, is a far-reaching global problem, occurring more often due to globalization. In case of a virus outbreak, it is helpful to know which host organism was the original carrier of the virus, so people can be separated from these hosts. The sooner this happens, the greater the chances of limiting an outbreak.
As a fast method of predicting the original host, we are working on the ability to predict the host of a virus based on the viral genome sequence. Due to the lack of knowledge about virus adaptation, it is difficult to find practical features for machine learning methods. With this in mind, we apply deep learning methods because they do not require predefined features and are one of today’s most powerful machine learning methods.
People involved: Florian Mock
Past people involved: Adrian Viehweger
Viruses in host genomes
Overall, little is known about the composition of the human genome. For example, it is assumed that 50% consist of repetitive sequences, whose function is still undescribed. These include SINEs or LINEs, which are thought to be of viral origin. Another component of the genome is the rarely described Endogenous Viral Elements (EVEs). These can be divided into rarely analyzed retrotransposons and other previously undescribed viral elements. Normally, retroviruses infect somatic cells and integrate their genetic material into the host genome. Infrequently, retroviruses are integrated into germline cells, in which case the viral genome can be passed on to offspring by sexual transmission. This process of integration is called endogenization. It was assumed that only retroviral viruses could integrate into the host genome (since for these viruses, integration was a necessary step in their life cycle). Surprisingly, several non-retroviral elements have already been found in different genomes, but these are confirmed individual occurrences, and there is no detailed catalog of EVEs. Little is known about the integration of non-retroviral EVEs. It has been frequently reported that the human genome harbors 8% viral elements. However, this number refers only to retroviruses and is almost 20 years old. At that time, much less was known about viruses and their replication mechanisms. Sequencing techniques and bioinformatic programs have also evolved enormously since then. It is therefore very surprising that, despite the description of individual non-retroviral elements in the human genome, there has been no fundamental systematic revalidation of the viral elements in the human genome that has occurred.
In their project, we address three fundamental questions: (1) How many viral elements are there in the human genome? (2) Are they functional? (3) Are the viral fragments accumulated over a human lifetime, or are they essentially inherited?
People involved: Muriel Ritsch
RNA structures and functions in viruses
Non-coding RNAs (ncRNA) are known regulatory elements in organisms from all kingdoms. The secondary structure of RNA is often linked to its function. When looking at a viral genome (especially an RNA genome) it only makes sense that viruses make strong use of ncRNAs to bypass the host’s immune response, regulate their own genes, or stop the gene expression of the host genes. We are concerned with the analysis of conserved RNA structures in well-described virus families and the de novo prediction of potentially functional structural elements in less known families. We use combinations of machine learning, clustering, and homology-based methods. The identification of functional structural elements could help to develop new antiviral therapies in the future, as important replication mechanisms of the virus can be disturbed.
In: J Virol, vol. 94, no. 21, 2020.
In: Virology, vol. 517, pp. 44–55, 2017.
Deciphering the RNA genome packaging code of influenza A viruses
Currently, bioinformatical tools are not specifically designed for viruses. However, viruses bring unique features, which require specific bioinformatical tools to trace virus-host interaction. For example, the number of sequences in quasispecies is massively high due to their high mutation rate, but only a few interact again with the host cells. Some viruses, such as IAV or as used by AG, are segmented RNA viruses, which urgently require tools with specific features: RNA viruses should include standardized secondary structure predictions, leading to RNA-RNA interaction prediction necessary for the packaging of segmented RNA viruses.
The aims of this project are to develop a bioinformatical tool to predict RNA-RNA interactions as packaging signal for segmented viruses, such as IAV, to develop a virus-specific full genome multiple sequence alignment algorithm to track the quasispecies; and to establish RNA-RNA interaction sets and more importantly non-interaction sets.
Collaborators: Roland Marquet, Andreas Henke
Grants: HORIZON 2020 MSCA ITN — VIROINF: Understanding (harmful) virus-host interactions by linking virology and bioinformatics.
Virus Database, interface and quality control
The NFDI4Microbiota consortium comprises 10 German partner institutions (including FSU Jena) and aims to build a centralized infrastructure with services for microbiome research. Viruses are a fundamental part of the microbiome and their investigation requires specialized tools and resources. Here at FSU Jena, we are building a virus genome sequence database encompassing all viruses, which will be used by virologists, viral ecologists, and others worldwide in accordance with the FAIR principles. We will do this by consulting the global network of virus experts, integrating expert knowledge and existing database structures, and incorporating international metadata standards. We plan to curate and provide an interface to access the virus genome sequences from public repositories e.g. the European Nucleotide Archive (ENA), the Sequence Read Archive (SRA), and GenBank/NCBI viruses. Further, we will offer visualization, analysis, and sharing of user-uploaded virus data, ensuring data protection for embargoed and private datasets.
High-Quality Alignments of Viruses
Multiple sequence alignments (MSAs) reveal homologous regions of input sequences and thus serve as a starting point for phylogenetic analyses at the molecular level. High-quality MSAs can be used to assess the conservation of primary sequence and even secondary structure. In particular, alignments of viral sequences are challenging due to their high mutation rate. Our goal is to create high-quality alignments of viral families and clades by first selecting representative genomes of a dataset by clustering. Then, the alignment is created using current homology-based methods. The final alignment is then used to predict conserved RNA secondary structures using an ILP approach. High-quality alignments of viruses can give insights into differences and similarities at the DNA/RNA level within a virus family. At the same time, providing such information as results of codon corrections (RNA sequence to protein) or compensatory mutations (RNA sequence to RNA secondary structure) is a major difficulty.
Improvement of ONT basecaller
Detecting RNA modifications with nanopore sequencing
RNA modifications such as the highly abundant N6-methyladenosine (m6A) are known as an important aspect of RNA biology. For example, m6A modification has been shown to be involved in the regulation of mRNA processing, but also RNA virus replication and translation. Second-generation sequencing methods for m6A detection are limited to position-only inference on known reference sequences. Nanopore direct RNA sequencing enables the assessment of modification status of individual reads at single-nucleotide resolution, but current detection models are still limited to position-only inference. We aim to use deep neural networks for de-novo modification detection on nanopore data that achieves high accuracy at single read, single-nucleotide resolution.
People involved: Sebastian Krautwurst
Direct RNA Sequencing for Complete Viral Genomes Incollection
In: Frishman, Dmitrij; Marz, Manja (Ed.): Virus Bioinformatics, CRC Press, 2021.
In: Genome Res, vol. 29, pp. 1545-1554, 2019.
Identifying DNA methylation biomarkers using nanopore sequencing
Carcinogenesis is associated with DNA methylation changes. Especially the methylation of 5-methylcytosine (5mC) in the context of regions with numerous 5’-cytosine-phosphate-guanine-3’ (CpG) occurrences, so-called CpG islands, plays a role here. These DNA methylation changes occur already at an early stage of cancer. Additionally, it is relatively simple to develop molecular biological tests once the regions of differential methylation are known. Therefore it is convincing to use DNA methylations for cancer screening. Nanopore sequencing makes it possible to identify DNA base modifications (e.g., 5mC) at nucleotide resolution. We aim to develop a workflow to identify DNA methylation cancer biomarkers for cancer subtypes that are not well studied yet.
Collaborations: oncgnostics GmbH
Workflow development for HTS data
High-throughput sequencing (HTS) of DNA and RNA has become a standard procedure in molecular biology. Widely used methods include next-generation sequencing (NGS) of short reads, offered primarily by Illumina, and third-generation sequencing (TGS) of long reads e.g. using Nanopore ONT. With decreasing costs, technological improvements, and wider use, there are more and more HTS datasets. The amount and size of the data can make the analysis difficult. We are developing reproducible, scalable, and portable workflows for processing and analyzing HTS data by deploying workflow management frameworks such as Nextflow. The goal is to provide easy-to-use workflows with state-of-the-art tools for different HTS types and applications.
Cell-free RNA sequencing
Cell-free RNA is present in the blood of every human being as a result of vesicular secretion from the cells of the human body. Sequencing cell-free RNA has been very promising for the diagnosis of several diseases ranging from cancer to cardiovascular diseases. In contrast to protein-based biomarkers, a huge advantage of RNA is that it can be amplified. In recent years technological developments enabled the amplification and sequencing of tiny amounts of RNA even from single cells. Another advantage of RNA over DNA (which could also be amplified) is that it is continuously shed from cells. In contrast, DNA exits cells only when the cell is dying. Hence, cell-free RNA holds the promise of continuous probing of transcriptional changes in the cells of the human body. We aim to apply methods from the field of single-cell RNA sequencing to sequence cell-free RNA in different medically relevant contexts. This approach enables the sequencing of minute amounts of RNA and our expertise in sequencing very small amounts of RNA from cells or even phase-separated condensates will facilitate the detection of even the smallest amounts of RNA from the blood or other body fluids.
Antibiotic resistance in the Ganges river valley
Antibiotics increasingly fail to treat a growing number of medical conditions due to antimicrobial resistance. This trend is especially acute in developing countries such as India, where broad resistances are known to have emerged. It is known that densely populated cities can drive the emergence and spread of antimicrobial resistance through for example industrial production sites, wastewater management practices, and other cultural characteristics. Proximity to waterways or associated water collections seems especially relevant.
To identify controllable drivers of resistance emergence and spread we investigate two cities on the river Ganges in India – Allahabad and Kanpur. We also investigate the effect of human interference by analyzing samples before and after Kumbh Mela, which is by far the largest religious gathering in Prayagraj. This will allow us to discern naturally occurring resistance from resistance created by humans.
Grants: BMBF – DBT Cooperative Science Program: Development of metagenomics assisted surveillance tools for tracking antibiotic resistance in river bodies — A study in the Ganges river valley (NANOLOG)
Prediction of antibiotic susceptibility profiles from whole-genome sequencing
In order to limit the spread of pathogenic drug-resistant bacteria and to maintain treatment options the analysis of clinical samples and their AMR profiles are essential. Particularly, in low-resource settings a timely analysis of AMR profiles is often impaired due to lengthy culturing procedures for antibiotic susceptibility testing or lack of laboratory capacity. Because of the relatively low costs, the possibility for real-time data analyses, and portability, the Oxford Nanopore Technologies MinION sequencing platform — especially in light of an upcoming less error-prone technology for the platform — appears to be well suited for pathogen genomic analyses. We developed the pipeline CholerAegon for the in silico prediction of AMR profiles of Vibrio cholerae genomes assembled from long and/or short sequencing reads. We aim to adapt our pipeline to other pathogenic microorganisms.
People involved: Sebastian Krautwurst
Collaborations: Kathrin Schuldt, Valeria Fuesslin
In: Front Microbiol, vol. 13, pp. 909692, 2022.
Epigenetic profiling of aging mouse brain at base resolution
Recent studies have proven that epigenetics, especially 5-methylcytosine (5mC), plays a pivotal role in aging. Along these lines, previous studies have reported diverse epigenetic profiles among different cell types like neurons and oligodendrocytes of the same individual. Besides methylation, DNA undergoes various other types of epigenetic modification. It remains to be investigated if these modifications changes upon aging and can thus also serve as an alternative reliable molecular marker of the epigenetic age of an individual. Thus, it is essential to identify variations in other epigenetic modifications of DNA in specific cell types from the same individual. We plan to study various modifications in a single chain reaction using long-read sequencing on the MinIon platform from ONT.
People involved: Akash Srivastava
The role of non-coding RNAs in human placental development
Inside the placenta, the fetal syncytiotrophoblast forms the interface between fetus and mother, from which exosomes and microvesicles are permanently released into the maternal circulation. These particles contain fetal proteins and ncRNAs for communication with neighboring and distant maternal cells. The number, size, and content of these particles may reflect or predict placental disorders. Several severe pregnancy pathologies, including preeclampsia, are human-specific and their pathomechanisms are not yet understood.
To date, most examples of ncRNAs that have been identified to be specific for fetal tissues, such as the placenta, are members of the group of microRNAs (miRNAs). Long ncRNAs have only been marginally investigated. We need to expand the knowledge about ncRNAs in the placenta and ncRNAs released from it to revolutionize the understanding of regulation processes inside the placenta and of fetal-maternal communication.
Grants: DFG MA 5082/9-1: Embryonale nicht-kodierende RNAs in der menschlichen Plazenta und dem mütterlichen Blutkreislauf
The Role of Non-Coding RNAs in the Human Placenta Journal Article
In: Cells, vol. 11, iss. 9, pp. 1588, 2022.
In: Placenta, vol. 88, pp. 20–27, 2019.
In: J Virol, vol. 93, no. 16, 2019.
In: bioRxiv, pp. 410381, 2018, (Now published in Placenta: https://doi.org/10.1016/j.placenta.2019.09.005).
KL 5 Trophoblast-immune cell communication via microRNA transported in extracellular vesicles Journal Article
In: Pregnancy Hypertens, vol. 9, pp. 5, 2017.
Bioinformatics support for researchers of the FSU and associated research institutes
The Bioinformatics Core Facility Jena (BiC) provides free support for researchers of the Friedrich Schiller University and associated research institutes in Jena at all stages of bioinformatics analysis. The support we offer ranges from consultations, basic bioinformatics services, and scientific workshops to full research collaborations. For our basic bioinformatics services, we have established modern, standardized workflows for numerous tasks in the field of high-throughput analysis and related research areas, starting with data quality control up to the final visualization of the results. For special applications, where our standardized methods reach their limits or a deeper interpretation of the results is desired, we offer individually adapted solutions in the form of full research collaboration. In the end, our aim is to contribute to the interdisciplinarity and development of life science research projects in Jena. Through our still-growing network of scientific partners, we have the opportunity to work on a huge variety of different topics.
Statistical modeling of genomic and transcriptomic data
In the last two decades in the biotechnological area, one revolutionary advancement was chased by another, leading the life sciences into the big data era. However, besides the availability of vast amounts of different biological data, we still lack sufficient statistical models and methods to accurately process and evaluate these data. We aim to develop specialized statistical tools in the context of genomics (e.g., fuzzy k-meres) and transcriptomics (e.g., accurate modeling of read count distributions). We work on different aspects of statistical analysis, starting from the theoretical problem formulation, to the implementation of statistical models and the appropriate visualization of results.
People involved: Emanuel Barth