Unsupervised Learning From Genomics Data

This page provides links to various statistical models and computational procedures we developed for identifying and characterizing multivariate patterns in genomics data. Our methods range from the procedures for identifying groups of genes with similar expresssion patterns (ie cluster analysis) to integrative models for deciphering gene expression regulation by jointly identifying co-expressed genes and similar patterns of regulatory events driving co-expression (eg transcription factor binding). Statistical models we developed are based on the Bayesian semi-parameteric models utilizing Dirichlet process priors (ie infinite mixture model). We also developed methods for functionally annotating and interactively viewing results implemented in our CLEAN (CLustering ENrichment Analysis) framework. To access software and learn more click on links below.