Category Archives: sven-rahmann

01.09.2017 | Best Paper Award at WABI 2017

The article “Analysis of min-hashing for variant tolerant DNA read mapping” by Jens Quedenfeld (now at TU Munich) and Sven Rahmann has received the Best Paper Award at the Workshop of Algorithms in Bioinformatics (WABI) 2017, held in Cambridge, MA, USA, August 20-23, 2017.

The authors consider an important question, as DNA read mapping has become a ubiquitous task in bioinformatics. New technologies provide ever longer DNA reads (several thousand basepairs), although at comparatively high error rates (up to 15%), and the reference genome is increasingly not considered as a simple string over ACGT anymore, but as a complex object containing known genetic variants in the population. Conventional indexes based on exact seed matches, in particular the suffix array based FM index, struggle with these changing conditions, so other methods are being considered, and one such alternative is locality sensitive hashing. Here we examine the question whether including single nucleotide polymorphisms (SNPs) in a min-hashing index is beneficial. The answer depends on the population frequency of the SNP, and we analyze several models (from simple to complex) that provide precise answers to this question under various assumptions. Our results also provide sensitivity and specificity values for min-hashing based read mappers and may be used to understand dependencies between the parameters of such methods. This article may provide a theoretical foundation for a new generation of read mappers.

The article can be freely accessed in the WABI conference proceedings (Proceedings of the 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), Russell Schwartz and Knut Reinert (Eds.), LIPICS Vol. 88).

This work is part of subproject C1 of the collaborative research center SFB 876.

22.08.2017 | Publication: A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification

Christopher Schröder and Sven Rahmann
Algorithms for Molecular Biology
DOI 10.1186/s13015-017-0112-1

The beta distribution is a continuous probability distribution that takes values in the unit interval [0,1]. It has been used in several bioinformatics applications to model data that naturally takes values between 0 and 1, such as relative frequencies, probabilities, absolute correlation coefficients, or DNA methylation levels of CpG dinucleotides or longer genomic regions. One of the most prominent applications is the estimation of false discov ery rates (FDRs) from p-value distributions after multiple tests by fitting a beta-uniform mixture. By linear scaling, beta distributions can be used to model any quantity that takes values in a finite interval [L, U ]⊂R. We show that the Maximum likelihood estimation for Beta distributions, MLE has significant disadvantages for beta distributions. The main problem is that the likelihood function is not finite (for almost all parameter values) if any of the observed data points are xi=0 or xi=1.
For mixture distributions, MLE frequently results in a non-concave problem with many local maxima, and one uses heuristics that return a local optimum from given starting parameters. Because already MLE for a single beta distribution is problematic, EM does not work for beta mixtures, unless ad-hoc corrections are made. We therefore propose a new algorithm for parameter estimation in beta mixtures that we call iterated method of moments.

 

28.07.2017 | Publication: Regions of common inter-individual DNA methylation differences in human monocytes: genetic basis and potential function

Christopher Schröder*, Elsa Leitão*, Stefan Wallner, Gerd Schmitz, Ludger Klein-Hitpass, Anupam Sinha, Karl-Heinz Jöckel, Stefanie Heilmann-Heimbach, Per Hoffmann, Markus M. Nöthen, Michael Steffens, Peter Ebert, Sven Rahmann and Bernhard Horsthemke
* Contributed equally
Epigenetics & Chromatin 2017
doi:10.1186/s13072-017-0144-2

There is increasing evidence for inter-individual methylation differences at CpG dinucleotides in the human genome, but the regional extent and function of these differences have not yet been studied in detail. For identifying regions of common methylation differences, we used whole genome bisulfite sequencing data of monocytes from five donors and a novel bioinformatic strategy.

We identified 157 differentially methylated regions (DMRs) with four or more CpGs, almost none of which has been described before. The DMRs fall into different chromatin states, where methylation is inversely correlated with active, but not repressive histone marks. However, methylation is not correlated with the expression of associated genes. High-resolution single nucleotide polymorphism (SNP) genotyping of the five donors revealed evidence for a role of cis-acting genetic variation in establishing methylation patterns. To validate this finding in a larger cohort, we performed genome-wide association studies (GWAS) using SNP genotypes and 450k array methylation data from blood samples of 1128 individuals. Only 30/157 (19%) DMRs include at least one 450k CpG, which shows that these arrays miss a large proportion of DNA methylation variation. In most cases, the GWAS peak overlapped the CpG position, and these regions are enriched for CREB group, NF-1, Sp100 and CTCF binding motifs. In two cases, there was tentative evidence for a trans-effect by KRAB zinc finger proteins.

Allele-specific DNA methylation occurs in discrete chromosomal regions and is driven by genetic variation in cis and trans, but in general has little effect on gene expression

05.07.2016 | Article about epigenetics of monocyte to macrophage differentiation accepted in “Epigenetics & Chromatin”

Christopher Schröder, Daniela Beißer and Sven Rahmann from the Genome Informatics group contributed to novel insights about epigenetic changes during cell differentiation. The article will appear soon in the renowned “Epigenetics & Chromatin” journal (IF 4.873) by BioMedCentral.

Epigenetic dynamics of monocyte to macrophage differentiation
by Stefan Wallner, Christopher Schröder, Elsa Leitão, Tea Berulava, Claudia
Haak, Daniela Beißer, Sven Rahmann, Andreas S Richter, Thomas Manke,
Ulrike Böhnisch, Laura Arrigoni, Sebastian Fröhler, Filippos Klironomos,
Wei Chen, Nikolaus Rajewsky, Fabian Müller, Peter Ebert, Thomas
Lengauer, Matthias Barann, Philip Rosenstiel, Gilles Gasparoni, Karl
Nordström, Jörn Walter, Benedikt Brors, Gideon Zipprich, Bärbel Felder,
Ludger Klein-Hitpass, Corinna Attenberger, Gerd Schmitz, Bernhard Horsthemke

Abstract:
Monocyte to macrophage differentiation involves major biochemical and
structural changes. In order to elucidate the role of gene regulatory
changes during this process, we used high-throughput sequencing to
analyze the complete transcriptome and epigenome of human monocytes that
were differentiated in vitro by addition of colony stimulating factor 1
(CSF1) in serum-free medium. Numerous mRNAs and miRNAs were
significantly up- or downregulated. More than 100 discrete DNA regions,
most often far away from transcription start sites, were rapidly
demethylated by the ten-eleven translocation (TET) enzymes, became
nucleosome-free and gained histone marks indicative of active enhancers.
These regions were unique for macrophages and associated with genes
involved in the regulation of the actin cytoskeleton, phagocytosis and
innate immune response. In summary, we have discovered a phagocytic gene
network that is repressed by DNA methylation in monocytes and rapidly
de-repressed after the onset of macrophage differentiation.

01.06.2016 | Article about estimation of beta mixture parameters accepted at WABI 2016

An article by Christopher Schröder and Sven Rahmann about estimating parameters of beta mixture models, which has applications in determining the methylation state of genomic regions, has been accepted at WABI 2016 and will be presented at the conference in Aarhus (Danmark), August 22-24, 2016. The paper will be available in the WABI 2016 proceedings (LNBI series, Springer Verlag) in  August 2016.

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification
by Christopher Schröder and Sven Rahmann

Abstract:
Mixtures of beta distributions have previously been shown to be a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm has significant computational advantages over the maximum-likelihood-based EM algorithm. As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels.

Neues Projekt “OsteoSys” im Rahmen des NRW-Leitmarktwettbewerbs

Voraussichtlich ab Juli diesen Jahres fördert die Leitmarkt Agentur NRW ein Projekt zur  Aufklärung molekularer Ursachen von Komplikationen bei der Osteoporose-Therapie mit Hilfe bioinformatischer Methoden. Die Osteoporose ist eine verbreitete Volkskrankheit, bei der mit zunehmendem Alter die Knochen anfällig für Brüche werden; vor allem Frauen sind betroffen.

615 Age and Bone MassDer Verbund verschiedener Wissenschaftler und Firmen wird koordiniert von Prof. Nina Babel (Transplantationsimmunologie, Marienhospital Herne, Klinikum der Ruhr-Universität Bochum). Die Genominformatik erhált dabei eine Förderung von bis zu 235000 Euro über insgesamt drei Jahre. Projektleiter Prof. Sven Rahmann ist begeistert: “In diesem Projekt machen wir einen großen Schritt in Richtung personalisierte Medizin. Wenn das Vorhaben erfolgreich ist, können wir maßgeschneiderte Therapien für verschiedene Bevölkerungsgruppen entwerfen. Bis dahin steht uns allerdings ein langer Weg mit molekularen Experimenten und Datenanalysen bevor”.
Der Lehrstuhl für Genominformatik sieht sich mit seiner großen Erfahrung in diesem Bereich für diese Herausforderung gut gerüstet.

2. Mai 2016 | Delegation der Genominformatik besucht das Institut für Angewandte Mikrobiologie der RWTH Aachen

Im Rahmen des vom BMBF geförderten e:Bio-Projekts “YeastScent” haben Elias Kuthe und Sven Rahmann das Blank Lab am iAMB der RWTH Aachen besucht und sich mit weiteren Projektpartnern über die Fortschritte bei den verschiedenen Arbeitspaketen ausgetauscht.
In dem Projekt sollen auf der Basis von Ionenmobilitätsspektrometrie Veränderungen des Hefemetabolismus bei Fermentationsprozessen schnell und nichtinvasiv erkannt werden. Das Projekt läuft noch bis Sommer nächsten Jahres.
Elias Kuthe stellte Methoden zur Peakerkennung und -auswertung vor, die er erarbeitet hat.

25.04.2016 | Publication: SimLoRD – Simulation of Long Read Data

Bianca Stöcker; Johannes Köster; Sven Rahmann
Bioinformatics 2016;
10.1093/bioinformatics/btw286

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics.
In a SMRT library the sequenced DNA fragments are circular with adapter sequences between forward and backward strand, and a fragment may be sequenced multiple times in a single run. For a single pass through the sequence (subread), the error rate is high, but it is possible to calculate a consensus after multiple passes (circular consensus sequence read, CCS). Thus the error rate of CCSs decreases with the number of passes.

We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. Reads are simulated from both strands of a provided or randomly generated reference sequence. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realistic SMRT read simulator available.

SimLoRD is available open source at http://bitbucket.org/genomeinformatics/simlord/ and installable via Bioconda.

April 2016 | Fachprojekt Bioinformatik an der TU Dortmund

Marianna D’Addario und Sven Rahmann bieten das Fachprojekt “Bioinformatik” an der TU Dortmund an.  Das Fachprojekt beschäftigt sich thematisch mit dem Design von Oligonukleotiden, z.B. für die Nanotechnologie und richtet sich an Bachelor-Studenten. Die Veranstaltung wird voraussichtlich im Wintersemester 2016/2017 erneut angeboten.

Poster about the EAGLE tool at University Hospital Essen Science Day 2015

Exome Analysis GraphicaL Environment (EAGLE)

Christopher Schröder, Christoph Stahl, Felix Mölder, André Janowicz, Jasmin Beygo, Marcel Martin, Sven Rahmanneagle

The Exome Analysis GraphicaL Environment (EAGLE) combines a best practices variant calling workflow, with a web frontend. By storing the called information in speficially structerd hdf5 files, EAGLE allows filtering and parameter tuning in almost real time. This enables iterative tuning of thresholds, or the selection of different samples for filtering by non computer scientists via the web interface.

Poster at GCB 2015 on mutational landscapes of relapsing neuroblastoma

Bioinformatics Analysis of Heterogenous Data Reveals Characteristic Mutational Landscapes of Neuroblastoma Relapses, GCB 2015 in Dortmund

Marc Schunb-posterlte, Johannes Köster, Daniela Beisser, Corinna Ernst, Christopher Schröder, Alexander Schramm and Sven Rahmann

Neuroblastoma is a malignancy of the developing sympathic nervous system that causes 15% of childhood cancer-related mortality. However, in the vast majority of cases death results not from the initial disease manifestation but rather from metastasis or recurrence.

Systematic search for genomic alterations in primary neuroblastomas has shown low genetic complexity, with significant mutations in only a very few genes. This study explored the genomic landscape of relapsing neuroblastoma in order to evaluate ‘driver’ mutations to be exploited as therapeutic targets.

Poster on Exomate at GfH 2014 in Essen

exomate-posterExomate: an easy to use exome sequencing analysis pipeline

Christopher Schröder, Johannes Köster, Christoph Stahl, Sebastian Venier, Sven Rahmann, Marcel Martin

Exomate is an exome-sequencing pipeline with a web frontend. It automates most steps needed to go from FASTQ files to variant calls, puts the calls and metadata about patients, samples, etc. into a database and then allows interactive analysis via a web frontend. It is primarily designed for easy use and has already been used in various studies [1,2,3].

[1] Martin, M. et al., 2013. Exome sequencing identifies recurrent somatic mutations in EIF1AX and SF3B1 in uveal melanoma with disomy 3. Nat. Genet. 45, 933–936.

[2] Czeschik, J.C. et al., 2013. Clinical and mutation data in 12 patients with the clinical diagnosis of Nager syndrome. Hum. Genet. 132, 885–898.

[3] Voigt, C., et al., 2013. Oto-facial syndrome and esophageal atresia, intellectual disability and zygomatic anomalies – expanding the phenotypes associated with EFTUD2 hfg mutations.
Orphanet J Rare Dis 8, 110.

 

Christopher Schröder presents a poster at Science Day 2013, University Hospital Essen

Target identification for metabolic engineering,

Christopher Schröder, Sven Rahmanntarget-poster

In metabolic engineering by gene knockouts, one searches for genes controlling metabolic reactions that should be removed from a metabolic network in order to optimize the yield of a desired metabolite.

In a conservative way, this is done by undirected mutagenesis selection of the population with best efficiency.

Unrean et al. developed a simple algorithm to directly predict reaction targets, to save the high costs of this uncontrolled expensive process. It is based on elementary modes, undecomposable sequences of metabolite transformation flows in the network.

We substantially improved the algorithm and applied it to a network of Escherichia coli to show the improved results.