549a Gibbs Sampling for Transcription Network Verification Using Gene Expression and Chip-Chip Data

Mark P. Brynildsen and James C. Liao. Chemical Engineering, UCLA, 5531 Boelter Hall, Los Angeles, CA 90095

The reliability of transcriptional regulatory analyses depends highly on the accuracy of the transcription network and expression data. Under situations when ChIP-chip binding data is available it is the preferred means by which to construct a transcription network. Due to experimental noise and the environmental sensitivity of binding, ChIP-chip data has a degree of uncertainty associated with it. As a further complication gene expression data is considerably noisy. To deal with these challenges and provide reliable data for transcriptional regulatory analyses we seek to identify those genes whose binding and expression data corroborate one another. Analyses based on these genes are expected to lead to more meaningful biological interpretations.

Previous methods have been devised to document disparity between ChIP-chip and gene expression data (Gao, Boulesteix, Ruan). By using gene expression data under the assumption that transcription factor activity should correlate with target gene expression, Gao et al 2004 concluded that on average 42% of the binding targets identified by ChIP-chip data in Saccharomyces cerevisiae are not true regulatory targets. Following the same assumption, Boulesteix et al 2005 documented an environmental dependence in the false positive rate of ChIP-chip data (stress response: 27%, cell cycle: 68%). In addition, Ruan et al 2005 used decision trees to investigate how well ChIP-chip data predicts the up-/down-/unchanged expression of genes under stress and cell cycle conditions. While all these approaches have had success detecting instances when ChIP-chip and gene expression data are in agreement, they all make key assumptions that may not be valid under all circumstances. For Gao et al 2004 and Boulesteix et al 2005, the assumption of correlation between transcription factor activity and target gene expression may be valid for singly regulated genes, but may not hold true for genes controlled by multiple regulators. For Ruan et al 2005, the implicit assumption that transcriptional regulation is an on/off event from a basal state ignores any type of more complicated regulation, such as a meaningful spectrum of induced and repressed states. We have recently developed a method that can identify instances of ChIP-chip and gene expression data agreement, allows for un-correlation between transcription factor activity and target gene expression when the gene is controlled by more than one regulator, and allows differential gene expression to be more than an on/off event from a basal state.

Our approach utilizes Gibbs sampling, Bayesian statistics, robust regression, and concepts from Network Component Analysis (Liao et al 2003) to identify those genes that have ChIP-chip binding data and gene expression data that support one another. To demonstrate the utility of this concept we have analyzed data from S. cerevisiae from a variety of environmental conditions to provide a dynamic perspective of transcriptional regulation.

Boulesteix, A.L., Strimmer, K. (2005). Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach. Theoretical biology and Medical Modelling, 2:23.

Gao, F., Foat, B., Bussemaker, H. (2004). Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics, 5:31.

Liao, J.C., Boscolo, R. Yang, Y.L.,Tran, L.M., Sabatti, C. Roychowdhury, V.P. (2003) Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA, 100(26):15522-7.

Ruan, J., Zhang, W. (2005). CAGER: classification analysis of gene expression regulation using multiple information sources. BMC Bioinformatics, 6(1):114.