01.09.2017 | Best Paper Award at WABI 2017

The article “Analysis of min-hashing for variant tolerant DNA read mapping” by Jens Quedenfeld (now at TU Munich) and Sven Rahmann has received the Best Paper Award at the Workshop of Algorithms in Bioinformatics (WABI) 2017, held in Cambridge, MA, USA, August 20-23, 2017.

The authors consider an important question, as DNA read mapping has become a ubiquitous task in bioinformatics. New technologies provide ever longer DNA reads (several thousand basepairs), although at comparatively high error rates (up to 15%), and the reference genome is increasingly not considered as a simple string over ACGT anymore, but as a complex object containing known genetic variants in the population. Conventional indexes based on exact seed matches, in particular the suffix array based FM index, struggle with these changing conditions, so other methods are being considered, and one such alternative is locality sensitive hashing. Here we examine the question whether including single nucleotide polymorphisms (SNPs) in a min-hashing index is beneficial. The answer depends on the population frequency of the SNP, and we analyze several models (from simple to complex) that provide precise answers to this question under various assumptions. Our results also provide sensitivity and specificity values for min-hashing based read mappers and may be used to understand dependencies between the parameters of such methods. This article may provide a theoretical foundation for a new generation of read mappers.

The article can be freely accessed in the WABI conference proceedings (Proceedings of the 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), Russell Schwartz and Knut Reinert (Eds.), LIPICS Vol. 88).

This work is part of subproject C1 of the collaborative research center SFB 876.

22.08.2017 | Publication: A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification

Christopher Schröder and Sven Rahmann
Algorithms for Molecular Biology
DOI 10.1186/s13015-017-0112-1

The beta distribution is a continuous probability distribution that takes values in the unit interval [0,1]. It has been used in several bioinformatics applications to model data that naturally takes values between 0 and 1, such as relative frequencies, probabilities, absolute correlation coefficients, or DNA methylation levels of CpG dinucleotides or longer genomic regions. One of the most prominent applications is the estimation of false discov ery rates (FDRs) from p-value distributions after multiple tests by fitting a beta-uniform mixture. By linear scaling, beta distributions can be used to model any quantity that takes values in a finite interval [L, U ]⊂R. We show that the Maximum likelihood estimation for Beta distributions, MLE has significant disadvantages for beta distributions. The main problem is that the likelihood function is not finite (for almost all parameter values) if any of the observed data points are xi=0 or xi=1.
For mixture distributions, MLE frequently results in a non-concave problem with many local maxima, and one uses heuristics that return a local optimum from given starting parameters. Because already MLE for a single beta distribution is problematic, EM does not work for beta mixtures, unless ad-hoc corrections are made. We therefore propose a new algorithm for parameter estimation in beta mixtures that we call iterated method of moments.


28.07.2017 | Publication: Regions of common inter-individual DNA methylation differences in human monocytes: genetic basis and potential function

Christopher Schröder*, Elsa Leitão*, Stefan Wallner, Gerd Schmitz, Ludger Klein-Hitpass, Anupam Sinha, Karl-Heinz Jöckel, Stefanie Heilmann-Heimbach, Per Hoffmann, Markus M. Nöthen, Michael Steffens, Peter Ebert, Sven Rahmann and Bernhard Horsthemke
* Contributed equally
Epigenetics & Chromatin 2017

There is increasing evidence for inter-individual methylation differences at CpG dinucleotides in the human genome, but the regional extent and function of these differences have not yet been studied in detail. For identifying regions of common methylation differences, we used whole genome bisulfite sequencing data of monocytes from five donors and a novel bioinformatic strategy.

We identified 157 differentially methylated regions (DMRs) with four or more CpGs, almost none of which has been described before. The DMRs fall into different chromatin states, where methylation is inversely correlated with active, but not repressive histone marks. However, methylation is not correlated with the expression of associated genes. High-resolution single nucleotide polymorphism (SNP) genotyping of the five donors revealed evidence for a role of cis-acting genetic variation in establishing methylation patterns. To validate this finding in a larger cohort, we performed genome-wide association studies (GWAS) using SNP genotypes and 450k array methylation data from blood samples of 1128 individuals. Only 30/157 (19%) DMRs include at least one 450k CpG, which shows that these arrays miss a large proportion of DNA methylation variation. In most cases, the GWAS peak overlapped the CpG position, and these regions are enriched for CREB group, NF-1, Sp100 and CTCF binding motifs. In two cases, there was tentative evidence for a trans-effect by KRAB zinc finger proteins.

Allele-specific DNA methylation occurs in discrete chromosomal regions and is driven by genetic variation in cis and trans, but in general has little effect on gene expression

28.07.2016 | Poster and Workshop at ECCB

A second poster from our working group will be presented at ECCB 2016. Daniela Beisser will present a poster about Taxonomic assignment of protist metatranscriptome sequences. She will also present the topic during the ECCB workshop “W11 – Recent Computational Advances in Metagenomics (RCAM’16)” on 4th September. See the workshop website for more information.

Taxonomic assignment of protist metatranscriptome sequences
Daniela Beisser, Nadine Graupner, Lars Grossmann, Jens Boenigk and Sven Rahmann

Next generation sequencing (NGS) technologies are increasingly applied to analyse complex microbial ecosystems by mRNA sequencing of whole communities, also known as metatranscriptome sequencing. In principle, each sequenced mRNA allows to both identify the species of origin and assign a function to the transcribed gene. While the functional information is sufficiently covered by databases such as Uniprot, NCBI, KEGG and many others, species identification is currently limited by incomplete reference databases. Inferring the community composition from metratranscriptomic samples is thus still a difficult problem. At the moment, most analyses are restricted to prokaryotic communities, which enjoy better database coverage, or to communities of few known species with sequenced genomes, or to a combination of rRNA and mRNA sequencing. However, the latter approach does not allow to link taxonomic and functional information directly.

Our approach focuses on an accurate assignment of taxonomic groups to metatranscriptomic reads. We constructed a custom database that comprises all major eukaryotic groups, developed a stand-alone tool to assign reads with a low false discovery rate and created a workflow for complete metatranscriptome analysis. The workflow covers all bioinformatic steps: preprocessing of the raw data, taxonomic and functional assignment, and visualisation of the results.

28.07.2016 | Poster about EAGLE at ECCB

A poster about the Exome Analysis GraphicaL Environment (EAGLE) was accepted for the ECCB 2016 at The Hague. Felix Mölder will present the poster there.

EAGLE: an easy-to-use web-based exome analysis environment
Christopher Schröder, Felix Mölder, Christoph Stahl and Sven Rahmann

High throughput exome sequencing is a widely used technology for deciphering mutations in the coding regions of a genome at relatively low cost. While bioinformatics analyses of exome sequencing data mostly agree on best practices regarding the analysis steps, called genomic variants depend on the set of parameters and applied filtering. We present EAGLE, a software that combines a best practices variant calling workflow with a web frontend. By storing the called variant information in HDF5 files (instead of SQL databases), EAGLE allows filtering and parameter tuning in almost real time. This enables iterative tuning of thresholds, or the selection of different samples for filtering by medical PIs via the web interface. The web interface presents metadata, annotations, quality control data and statistics to facilitate a comprehensive data analysis on different levels.

Juli 2016 | mundo berichtet über Projekt “Data Driven Materials Design”

Im aktuellen Sonderheft des Forschungsmagazins mundo zum Thema “Materials Chain” ist ein Bericht über das Projekt Data Driven Materials Design erschienen. Das vom 01.10.2012 – 30.09.2014 vom Mercator Research Center Ruhr (MERCUR) geförderte Projekt galt dem systematischen Design neuer Materialien durch die interdisziplinäre Zusammenarbeit zwischen Materialwissenschaften und Informatik. Dabei handelte es sich um eine Kooperation zwischen den Fakultäten für Physik und Astronomie (Prof. Drautz) und für Maschinenbau (Prof. Ludwig) der Ruhr-Universität Bochum mit zwei Informatik-Lehrstühlen der TU Dortmund (Prof. Morik) und der Universität Duisburg-Essen (Prof. Rahmann) zum Data Mining bzw. zur Hochdurchsatzanalyse.

19.07.2016 | Neue Masterarbeit zum Thema Unterscheidung echter genetischer Varianten von systematischen Sequenzierfehlern

Till Hartmann, der schon früher am Lehrstuhl für Genominformatik als studentische Hilfskraft am dinopy-Projekt gearbeitet hat, wird seine Masterarbeit in Informatik am Lehrstuhl schreiben.
Thematisch geht es darum, in Hochdurchsatz-Sequenzdaten systemstische Sequenzierfehler von echten genetischen Varianten mit Hilfe von Methoden des maschinellen Lernens zu unterscheiden. Die Arbeit gliedert sich ein in das Teilprojekt C1 “Merkmalsselektion in hochdimensionalen Daten am Beispiel der Risikoprognose in der Onkologie” des SFB 876 “Verfügbarkeit von Information durch Analyse unter Ressourcenbeschränkung”.
Willkommen und viel Erfolg dabei, Till!

05.07.2016 | Article about epigenetics of monocyte to macrophage differentiation accepted in “Epigenetics & Chromatin”

Christopher Schröder, Daniela Beißer and Sven Rahmann from the Genome Informatics group contributed to novel insights about epigenetic changes during cell differentiation. The article will appear soon in the renowned “Epigenetics & Chromatin” journal (IF 4.873) by BioMedCentral.

Epigenetic dynamics of monocyte to macrophage differentiation
by Stefan Wallner, Christopher Schröder, Elsa Leitão, Tea Berulava, Claudia
Haak, Daniela Beißer, Sven Rahmann, Andreas S Richter, Thomas Manke,
Ulrike Böhnisch, Laura Arrigoni, Sebastian Fröhler, Filippos Klironomos,
Wei Chen, Nikolaus Rajewsky, Fabian Müller, Peter Ebert, Thomas
Lengauer, Matthias Barann, Philip Rosenstiel, Gilles Gasparoni, Karl
Nordström, Jörn Walter, Benedikt Brors, Gideon Zipprich, Bärbel Felder,
Ludger Klein-Hitpass, Corinna Attenberger, Gerd Schmitz, Bernhard Horsthemke

Monocyte to macrophage differentiation involves major biochemical and
structural changes. In order to elucidate the role of gene regulatory
changes during this process, we used high-throughput sequencing to
analyze the complete transcriptome and epigenome of human monocytes that
were differentiated in vitro by addition of colony stimulating factor 1
(CSF1) in serum-free medium. Numerous mRNAs and miRNAs were
significantly up- or downregulated. More than 100 discrete DNA regions,
most often far away from transcription start sites, were rapidly
demethylated by the ten-eleven translocation (TET) enzymes, became
nucleosome-free and gained histone marks indicative of active enhancers.
These regions were unique for macrophages and associated with genes
involved in the regulation of the actin cytoskeleton, phagocytosis and
innate immune response. In summary, we have discovered a phagocytic gene
network that is repressed by DNA methylation in monocytes and rapidly
de-repressed after the onset of macrophage differentiation.

01.07.2016 | Neues Forschungsprojekt: OsteoSys

Ab dem ersten Juli diese Jahres beteiligt sich die Genominformatik an einem neuen Projekt zur Aufklärung molekularer Ursachen von Komplikationen bei der Osteoporose-Therapie.
Das übergeordnete Ziel des, auf vier Jahre angelegten, Projektvorhabens ist die Etablierung einer personalisierten Therapie.
Der Verbund verschiedener Wissenschaftler und Firmen wird von Prof. Nina Babel (Transplantationsimmunologie, Marienhospital Herne, Klinikum der Ruhr-Universität Bochum) koordiniert.
Auf Grund umfassender Erfahrungen im Bereich der Datenauswertung wird die Genominformatik sich mit der Analyse von bioinformatischen Daten auf genetischer und epigenetischer Ebene befassen. Hierzu erhält sie für eine Förderung in Höhe von 235.188€.

Das Projekt wird gefördert durch die Europäischen Fonds für regionale Entwicklung und die Leitmarkt Agentur NRW.

01.06.2016 | Article about estimation of beta mixture parameters accepted at WABI 2016

An article by Christopher Schröder and Sven Rahmann about estimating parameters of beta mixture models, which has applications in determining the methylation state of genomic regions, has been accepted at WABI 2016 and will be presented at the conference in Aarhus (Danmark), August 22-24, 2016. The paper will be available in the WABI 2016 proceedings (LNBI series, Springer Verlag) in  August 2016.

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification
by Christopher Schröder and Sven Rahmann

Mixtures of beta distributions have previously been shown to be a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm has significant computational advantages over the maximum-likelihood-based EM algorithm. As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels.

Neues Projekt “OsteoSys” im Rahmen des NRW-Leitmarktwettbewerbs

Voraussichtlich ab Juli diesen Jahres fördert die Leitmarkt Agentur NRW ein Projekt zur  Aufklärung molekularer Ursachen von Komplikationen bei der Osteoporose-Therapie mit Hilfe bioinformatischer Methoden. Die Osteoporose ist eine verbreitete Volkskrankheit, bei der mit zunehmendem Alter die Knochen anfällig für Brüche werden; vor allem Frauen sind betroffen.

615 Age and Bone MassDer Verbund verschiedener Wissenschaftler und Firmen wird koordiniert von Prof. Nina Babel (Transplantationsimmunologie, Marienhospital Herne, Klinikum der Ruhr-Universität Bochum). Die Genominformatik erhált dabei eine Förderung von bis zu 235000 Euro über insgesamt drei Jahre. Projektleiter Prof. Sven Rahmann ist begeistert: “In diesem Projekt machen wir einen großen Schritt in Richtung personalisierte Medizin. Wenn das Vorhaben erfolgreich ist, können wir maßgeschneiderte Therapien für verschiedene Bevölkerungsgruppen entwerfen. Bis dahin steht uns allerdings ein langer Weg mit molekularen Experimenten und Datenanalysen bevor”.
Der Lehrstuhl für Genominformatik sieht sich mit seiner großen Erfahrung in diesem Bereich für diese Herausforderung gut gerüstet.

Workshop im Rahmen der Schnupperuni der TU Dortmund

Marianna und Nina werden im Rahmen der Schnupperuni der TU Dortmund einen Workshop zum Thema “Genomassemblierung – Wie Bioinformatiker den Bauplan des Lebens entschlüsseln” anbieten. Die Schnupperuni richtet sich ans Schülerinnen und Schüler
der gymnasialen Oberstufe, die die Möglichkeit bekommen, eine Woche lang verschiedene Studiengänge kennenzulernen. Die Schnupperuni findet vom 15. – 19.08.2016 statt, eine Anmeldung ist derzeit noch möglich.

2. Mai 2016 | Delegation der Genominformatik besucht das Institut für Angewandte Mikrobiologie der RWTH Aachen

Im Rahmen des vom BMBF geförderten e:Bio-Projekts “YeastScent” haben Elias Kuthe und Sven Rahmann das Blank Lab am iAMB der RWTH Aachen besucht und sich mit weiteren Projektpartnern über die Fortschritte bei den verschiedenen Arbeitspaketen ausgetauscht.
In dem Projekt sollen auf der Basis von Ionenmobilitätsspektrometrie Veränderungen des Hefemetabolismus bei Fermentationsprozessen schnell und nichtinvasiv erkannt werden. Das Projekt läuft noch bis Sommer nächsten Jahres.
Elias Kuthe stellte Methoden zur Peakerkennung und -auswertung vor, die er erarbeitet hat.

25.04.2016 | Publication: SimLoRD – Simulation of Long Read Data

Bianca Stöcker; Johannes Köster; Sven Rahmann
Bioinformatics 2016;

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics.
In a SMRT library the sequenced DNA fragments are circular with adapter sequences between forward and backward strand, and a fragment may be sequenced multiple times in a single run. For a single pass through the sequence (subread), the error rate is high, but it is possible to calculate a consensus after multiple passes (circular consensus sequence read, CCS). Thus the error rate of CCSs decreases with the number of passes.

We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. Reads are simulated from both strands of a provided or randomly generated reference sequence. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realistic SMRT read simulator available.

SimLoRD is available open source at http://bitbucket.org/genomeinformatics/simlord/ and installable via Bioconda.

April 2016 | Fachprojekt Bioinformatik an der TU Dortmund

Marianna D’Addario und Sven Rahmann bieten das Fachprojekt “Bioinformatik” an der TU Dortmund an.  Das Fachprojekt beschäftigt sich thematisch mit dem Design von Oligonukleotiden, z.B. für die Nanotechnologie und richtet sich an Bachelor-Studenten. Die Veranstaltung wird voraussichtlich im Wintersemester 2016/2017 erneut angeboten.

Februar 2016 | Neuer Mitarbeiter in Essen

Elias Kuthe verstärkt unser Team in Essen seit Februar 2016. Er wird für das “YeastScent”-Projekt Ionenmobilitätsspektrometrie-Daten (IMS) untersuchen. Außerdem interessiert Elias sich für Optimierung, er hat lange Zeit am Lehrstuhl für diskrete Optimierung (LS V, Mathematik, TU Dortmund) als SHK gearbeitet. Zur Zeit arbeitet er an einer julia-Implementierung
des Fused LASSO Signal Approximator.

Poster about the EAGLE tool at University Hospital Essen Science Day 2015

Exome Analysis GraphicaL Environment (EAGLE)

Christopher Schröder, Christoph Stahl, Felix Mölder, André Janowicz, Jasmin Beygo, Marcel Martin, Sven Rahmanneagle

The Exome Analysis GraphicaL Environment (EAGLE) combines a best practices variant calling workflow, with a web frontend. By storing the called information in speficially structerd hdf5 files, EAGLE allows filtering and parameter tuning in almost real time. This enables iterative tuning of thresholds, or the selection of different samples for filtering by non computer scientists via the web interface.

06.11.2015 | Publication: Human TLR8 senses UR/URR motifs in bacterial and mitochondrial RNA

EMBO Reports 2015 Dec; 16(12): 1656–1663.

Toll‐like receptor (TLR) 13 and TLR2 are the major sensors of Gram‐positive bacteria in mice. TLR13 recognizes Sa19, a specific 23S ribosomal (r) RNA‐derived fragment and bacterial modification of Sa19 ablates binding to TLR13, and to antibiotics such as erythromycin. Similarly, RNase A‐treated Staphylococcus aureus activate human peripheral blood mononuclear cells (PBMCs) only via TLR2, implying single‐stranded (ss) RNA as major stimulant. Continue reading