COMP 790-124 (Fall 2011) — Machine Learning in Computational Biology

Modern techniques in machine learning and their application to computational biology problems.

Organizational

Time: Tuesdays,Thursdays 12:30-1:45
Place: Sitterson 011
Prerequisites: Linear algebra, Probability or Statistics, Biology, some programming (Matlab/R/Python)
Instructor: Vladimir Jojic (vjojic@cs.unc.edu)
Office hours: SN 319 Tuesdays 2pm-3pm

Overview

Rapid accumulation of biological data enabled by novel measurement technologies necessitates innovation in data analysis. Machine learning is a growing field that has found numerous applications ranging from basic biology to personalized medicine. Whether discovering signatures of cancer or recommending the best treatment, the modeling and analysis paradigms of machine learning have been fruitfully applied. This course aims to introduce you to the basics of machine learning and their application to burning questions in biology and medicine.

Structure

The course aims to engage you in solving computational biology problems using machine learning. In order to achieve this, the course will consist of three components

  1. Lectures covering ML and comp bio applications
  2. Student led discussion of relevant papers
  3. Student project or a written survey of machine learning/comp bio literature

A project that yields a novel and exciting prediction may be selected for experimental validation either commercially (AssayDepot) or by collaborators at UNC.

Grading

  • 3 credits:
  • 1 credit:
    • Paper presentation: 60%
    • Discussion participation: 40%

Topics covered

Machine Learning

  1. Linear models for regression/classification (+sparse)
  2. Mixture and hierarchical models
  3. Subspace models: factor analysis, PCA (+sparse)
  4. Graphical models: inference and learning
  5. Expectation Maximization and variants (including variational approximations)
  6. Structured models: chains (HMM), trees (phylo- and ontogenies)
  7. Structure learning in Gaussian models
Depending on interest and time we may cover:
  • Max margin approaches
  • Bayesian nonparametrics
  • Random projections and compressed sensing

Computational Biology applications

  1. Motif discovery
  2. Regulatory network reconstruction
  3. QTL
  4. Epitope prediction
  5. Modeling vaccine and drug responses
  6. Metagenomics
  7. Epigenetics
Depending on interest and papers you choose to present we may cover additional subjects.

Audience

Students from Computer Science, Bioinformatics, Biology are welcome. I encourage joint projects between students from complementary disciplines.

Textbook

There is no textbook for this course, but you may find following helpful:
  1. "Pattern Recognition and Machine Learning," Chris M. Bishop
  2. "Probabilistic Graphical Models," Daphne Koller and Nir Friedman
  3. "The Elements of Statistical Learning: Data Mining, Inference, and Prediction," T. Hastie, R. Tibshirani, J. Friedman, download
  4. "Information Theory, Inference, and Learning Algorithms," David MacKay, download
  5. "Bioinformatics: The Machine Learning Approach", Pierre Baldi, Søren Brunak

Links

Reading assignments

Pick an unassigned paper from this list; e-mail me and I will update the list. First come, first served.

Slides

Date Topic Slides Code
8/23 Organizational Intro [PDF]
8/25 Linear regression Linear Regression, Ridge, Lasso [PDF]
Coordinate ascent [Matlab]
8/30 Linear regression Linear Regression, Elastic Net, FLasso, ADMM [PDF]
Example project proposal [Zip]
LinReg Elastic Net coordinate ascent[Matlab]
9/1 Logistic regression Logistic regression, ridge [PDF]
LogReg Gradient ascent [Matlab]
9/6 Logistic regression Quadratic approximations, Logistic regression and lasso/elastic net [PDF] LogReg Elastic Net coordinate ascent [Matlab]
LogReg Elastic Net interior point [Matlab]
9/8 Logistic regression wrap up; PCA ROC,AUC,Logistic regression apps; PCA [PDF]
9/13 EM Info Theory, EM, MoG,MoPWM [PDF] Mixture of Gaussians [Matlab]
Mixture of PWMs [Matlab]
9/15 EM EM, K-means, Factor Analysis[PDF] Factor Analysis[Matlab]
9/20 Graphical Models Representations[PDF]
9/22 Student presentations Slides
9/27 Student presentations Slides
9/29 Graphical Models Inference[PDF]
10/4 Graphical Models Variational inference and Learning[PDF]
10/6 Graphical Models HMM, CRF [PDF]
10/11 Graphical models Convex optimization, GMRFs, structure learning[PDF]
10/13 Student presentations Slides (one time move to FB141)
10/18 Discriminative methods Decision trees, Boosting, Bagging, Random Forests[PDF]
10/25 Discriminative methods SVM,Max Margin methods[PDF]
10/27 Active learning Semisupervised and Active learning[PDF]
11/1 Student presentations Slides
11/3 Bayesian approaches Hierarchical models, MCMC [PDF]
11/8 Bayesian approaches Gaussian Processes[PDF]
11/10 Bayesian approaches LDA, Dirichlet Processes[PDF]
11/15 Student presentations Slides
11/17 Student presentations Slides
11/22 Bayesian approaches Variational Bayes[PDF]
11/29 Sparse coding Compressed sensing and random projections[PDF]
12/1 Student presentations Slides
12/6 Student presentations Slides