April 13th
4:00-5:00p.m.
MDS 220
Refreshments: 312 MDS building
Title:
Differential expression in RNA-seq
Abstract:
Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the negative binomial (gamma-Poisson mixture) model. The distinct feature in various methods is how the variance, or dispersion, in the gamma distribution is modeled and estimated. We evaluated several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.
Time permitting; I will present an on-going project to integrate protein binding ChIP-seq data in eQTL mapping. A hierarchical model is developed for the data integration. The prior probability for a SNP being associated with a gene can be modeled as a function of its surrounding protein binding profiles. Model parameters and posterior probabilities can be estimated via an EM type of algorithm.