22.08.2017 | Publication: A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification

Christopher Schröder and Sven Rahmann
Algorithms for Molecular Biology
DOI 10.1186/s13015-017-0112-1

The beta distribution is a continuous probability distribution that takes values in the unit interval [0,1]. It has been used in several bioinformatics applications to model data that naturally takes values between 0 and 1, such as relative frequencies, probabilities, absolute correlation coefficients, or DNA methylation levels of CpG dinucleotides or longer genomic regions. One of the most prominent applications is the estimation of false discov ery rates (FDRs) from p-value distributions after multiple tests by fitting a beta-uniform mixture. By linear scaling, beta distributions can be used to model any quantity that takes values in a finite interval [L, U ]⊂R. We show that the Maximum likelihood estimation for Beta distributions, MLE has significant disadvantages for beta distributions. The main problem is that the likelihood function is not finite (for almost all parameter values) if any of the observed data points are xi=0 or xi=1.
For mixture distributions, MLE frequently results in a non-concave problem with many local maxima, and one uses heuristics that return a local optimum from given starting parameters. Because already MLE for a single beta distribution is problematic, EM does not work for beta mixtures, unless ad-hoc corrections are made. We therefore propose a new algorithm for parameter estimation in beta mixtures that we call iterated method of moments.