CSC 645 Class Schedule

When	Who Presents / What is Due	Description
Jan 15	Kobus	Introductory Lecture
Jan 20	Kobus	Chapter one of Pattern Recognition and Machine Learning by Chris Bishop, up to and including 1.54. Details in section 1.2.6 are less critical, but the meaning of equation 1.68 is important. Key concepts that you should be working towards understanding are generative statistical models, and the basics of Bayesian statistics. Those who need to brush up of probability may also want to have a look at the following introduction to probability . Alternatively, you could consult your favorite probability text, or Wikipedia. A few slides from the second class.
Jan 22	Kobus	8.1 and 8.2 of chapter eight of Pattern Recognition and Machine Learning by Chris Bishop. Section 8.1.4 is perhaps less important if this is your first time through such material. A key underlying concept that you should make sure you understand is conditional independence. If you are surfing the web for additional material, keywords include graphical models, Bayes nets, and conditional independence. I recommend checking out some of these videos here , with perhaps this one being the most useful (I have only watched the first few minutes of about 1/2 of them.) A few slides from the third class.
Jan 27	Ernesto	8.3 through 8.4.3 of chapter eight of Pattern Recognition and Machine Learning by Chris Bishop. If you are getting bored of Bishop, another introduction is linked here. Turnin key for response (on lectura): cs645_04 Ernesto's presentation
Jan 29	Andrew	8.4.4 through the end of chapter eight of Pattern Recognition and Machine Learning by Chris Bishop. If you are getting bored of Bishop, another introduction is linked here. Turnin key for response (on lectura): cs645_05 Andrew's presentation
Feb 03	Maria	Here we will study Gaussian mixture models (GMM's) and expectation maximization (EM). GMM's are the canonical statistical clustering model. Hence if you are thinking of clustering for your project, then you will want to study this topic carefully. The basic model is relatively easy to understand, and using EM for inference on it is also relatively straightforward. However, the reading includes a lot of background theory. If you are are not already familiar with this topic, then your goal should be to use the reading and/or other material on the web to understand: K-means from an implementation perspective GMM from a modeling perspective EM on GMM's from an implementation perspective The reading is from chapter nine of Pattern Recognition and Machine Learning by Chris Bishop. I expect that the first paragraph to be a bit obscure for many---you can ignore it---it makes sense once you understand the rest of the material. Focus on 9.1 and 9.2, if this is your first exposure to the topic. Sections 9.3 through the end of 9.3.2, and 9.3 is also worth a look. Turnin key for response (on lectura): cs645_06 Maria's presentation
Feb 05	Raquel	Here we will study hidden Markov models (HMM's). This is an important model, and is a good one to consider for projects that involve sequence data that is represented as a sequence of state changes. For example, HMM's are often used for tasks like gesture or action recognition. Like GMM's, HMM's are common, well understood, and there is much additional material about them on the web. We will keep things simple and use Bishop for common ground. (This will be the LAST reading from Bishop). The reading is up to 13.2.5 of chapter thirteen of Pattern Recognition and Machine Learning by Chris Bishop. Some parts, such as 13.2.4 are less important and should be skimmed, leaving time to get the main points. It is important to learn: HMM from a generative modeling perspective. What the three inference problems are, and some sense of what the algorithm for solving them are. Another reference for HMM's is Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, 77 (2), p. 257-286, February 1989. Turnin key for response (on lectura): cs645_07 Raquel's presentation
Feb 10	Ray	We will now move onto MCMC sampling. This is the start on a bunch of papers on inference. This is also a good source of projects. In particular, you can create a project, or extend the depth of any project, by proposing to study and compare several inference approaches on real data. For example, EM vs MH vs HMC. (At this point you may only know what the first acronym stands for). There is much material available on MCMC sampling including chapter 11 of Bishop, and Radford Neal's review which is very good, and you might want to read it in parallel with other sources. However, for common ground, we will use C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. The key material to prepare for this class is section 3 up to the end of section 3.5, with the key thing to learn is the Metropolis-Hastings (MH) algorithm (and the special case of Gibbs Sampling). You might find parts of the paper before section 3 useful; read it as needed, but not at the expense of understanding MH. This paper goes on to discuss Monte Carlo EM (worthwhile), and then Hybrid Monte Carlo and Reversible jump, which are topics that we will do in more detail soon. But I recommend reading the treatment in this paper to begin preparing for these future classes. To repeat: If you are new to this topic, the critical part is the Metropolis-Hastings algorithm. Please make sure you understand it. You may find it interesting that the MH algorithm was proposed as one of the top ten algorithms in scientific computing in this article . The blurb on MH is linked here. Turnin key for response (on lectura): cs645_08
Feb 12	Swami	Hybrid Monte-Carlo (HMC). HMC has proven to be quite effective in one of the vision lab's projects. It is worth studying. The mapping to physical systems might seem confusing, but it is worth getting used to it---it is mostly notation. Start with section 3.6 of C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. Then read Neal, R. M. (1994) "An improved acceptance procedure for the hybrid Monte Carlo algorithm", Journal of Computational Physics, vol. 111, pp. 194-203. This paper provides a nice intro to HMC and an extension that might be interesting to experiment with. Page 7 onwards is not so important. A fun applet that Swami found showing some MCMC methods. Note however that intuition gained here may not apply to difficult high dimensional search spaces. Swami's presentation Turnin key for response (on lectura): cs645_09
Feb 13	Tentative project selection	By this date, you need to inform Kobus of your project plans. Most likely you will either have chosen your project long before this time or you will have been undergoing intense discussion with Kobus. You need to hand in a text file containing a few sentences declaring what your project is. In particular, you need to give me a rough idea what the generative statistical model will be---enough so I can be sure that the proposal due in a few weeks will in fact have a generative statistical model in it. If you are working with a partner, let me know who it is. Turnin key for a few sentence on project topic: cs645_topic
Feb 17	Qiyam	Reversible Jump MCMC. RJMCMC is a canonical way to handle the very important issue of model selection where the number of parameters is not known (often the case). Start with section 3.7 of C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. Figure 17 is very useful, as is the example at the end of the section. The material in 3.7 serves as a good summary this topic, and if you can master that, you are in good shape. Then read Green, P. (2003). Trans-dimensional Markov chain Monte Carlo, in Highly Structured Stochastic Systems. The main focus will be section 2, which overlaps with the summary in Andrieu et al. The other material is further discussion on trans-dimensional sampling, and is less important, especially if you are new to sampling. However, using sampling for model selection is an important and difficult area, and some discussion about the larger issues in class will hopefully occur, so please give them some thought. Ideally, we will get collectively understand of the basic why and how of Green's method, and some general understanding of the larger issues. I think this supersedes the original paper (which is covered in section two and put into more context in the 2003 paper. But in case it useful, the original is Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrica 82, 711--732. Things that you want to get comfortable with: The differential notation P(dx) for p()dx. Does the notation make sense? The determinant of the Jacobian as the measure of the relative volume of dx and dx'. Turnin key for response (on lectura): cs645_10 Qiyam's presentation
Feb 19	Kobus	The format for this session will be a bit different. We will have not have a slide presentation. Rather we will focus on discussion on MCMC methods, solidifying what we have read so far. To provide some bases for discussion, have a look at the two papers listed below which apply MCMC. We will not worry so much about their technical details. Just focus on the main points: What is the problem, what is approach to solve it, and what might be strengths and weakness of the solution. For example, in the first paper, the main contribution is two technical results relating to complexity and convergence. You can ignore them if you like. For us, the paper simply serves as a good introduction to using MCMC for multi-target tracking. For each paper I have listed a discussion topic that came to mind that we will discuss. I have my opinions about them, but I want to know what you think. In addition, I will be calling on the audience to help summarize bits of the papers. Paper one: S. Oh, S. Russell, S. Sastry, "Markov Chain Monte Carlo Data Association for Multiple-Target Tracking", Technical Report. (Reference 20, linked here is interesting, if a bit dense. An additional paper from the same group (linked here ) solves one of the "open" questions listed in reference 20. A Google search reveals some some additional recent papers that are also related. This is not an area I know much about. However, if someone looks at this material and thinks it is worth doing some of it as part of the class, we could consider it.) Discussion question: The number of tracks is not known. Yet, reversible jump is not used. Why is it not needed? If not, does clustering permit an analogous interpretation? Is thinking about clustering in this way useful? Paper two: Zhihua Zhang, Kap Luk Chan, Yiming Wu, Chibiao Chen, Learning a Multivariate Gaussian Mixture Model with the Reversible Jump MCMC Algorithm, Volume 14, Issue 4 (October 2004), Pages: 343-355. Discussion question: The prior over the number of clusters is uniform. Does that make sense? Why is the "answer" not simply the maximal number of clusters in all cases? Is there a flaw in this paper? Turnin key for response (on lectura): cs645_11
Feb 24	Ram	Data-driven MCMC (part one). In many problems, such as those in computer vision, MCMC can give good results because of problem structure, and, given the right representation, a global maximum dominates the space. However, searching, or even getting a chain started that is reasonable, is much improved by peeking at the data. Conditioning of the data is called data-driven MCMC. The concept has been around for a while. But it seems that it got named and claimed in the following paper: Z.W. Tu and S.C. Zhu, Image Segmentation by Data-Driven Markov Chain Monte Carlo, IEEE Trans on Pattern Analysis and Machine Intelligence, vol.24, no.5, pp. 657-673, May, 2002. Unfortunately, I do not know of a suitable stand-alone paper focused on data-driven methods; the method is relatively application dependent. With the above paper, we will see MCMC being used to integrate many hammers being applied simultaneously to a difficult problem. This paper is quite technical, so we will break it up into two parts, combining the second part with another paper. For the first part we will see the problem being setup with and reversible jump moves being defined. Note that Green's paper is referenced, but these authors do not bother telling you much about how they compute the differential probability ratios. Recall that Green provided a nice way around things through space extension and the determinant of the Jacobian---I do not know the underlying details of what is done in this paper. We will read up to 4.2, and skim 4.3, which sets us up for Thursday. Turnin key for response (on lectura): cs645_12
Feb 26	Kyle	Data-driven MCMC (part two). Read the rest (i.e., section 4.3 onwards) of: Z.W. Tu and S.C. Zhu, Image Segmentation by Data-Driven Markov Chain Monte Carlo, IEEE Trans on Pattern Analysis and Machine Intelligence, vol.24, no.5, pp. 657-673, May, 2002. In parallel, read the first eight page of: R. Maciuca and S.C. Zhu, How to Heuristics Expedite Markov Chain Search, International Workshop on Statistical and Computational Theories of Vision, 2003. Turnin key for response (on lectura): cs645_13
March 02 noon	Project proposal	The project proposal is a substantive document explaining what your problem is, why it is important, literature context, what your proposed model is, and some mention (probably not very detailed) of inference. There will be references. This is a proposal, not a paper. However, the material will be useful for your paper, provided that you rewrite it for a different audience. It will be reviewed by your peers. Turnin key for proposal (on lectura): cs645_proposal
Mar 03	Javad	The reading for this session is: Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences. 105(31), 10687-10692. I see that the authors have put the data and code on-line here. (I have not looked at it myself). Turnin key for response (on lectura): cs645_14 Javad's presentation
Mar 05	Kobus	Reviewing and writing for reviewers No response required.
Mar 09 11:59 PM.	Project proposal peer review	Written reviews need to be submitted. Your review assignments should appear in: lectura:/local/cs645/review_assignments/${USER} All reviewers have 3 proposals to review. Turnin key for reviews (on lectura): cs645_proposal_reviews Additional comments on hard copy can be given back in person to the reviews in class on March 10.
Mar 10	Kobus	Linear algebra for sampling No response required.
Mar 12	Kobus	Linear algebra for sampling (continued). No response required.
Mar 17	Spring Break	No class
Mar 19	Spring Break	No class
Mar 24	Ximing Yu	Sebastian Thrun, Dieter Fox, Wolfram Burgard and Frank Dellaert, Robust Monte Carlo localization for mobile robots, Artificial Intelligence, Volume 128, Issues 1-2, May 2001, Pages 99-141. Turnin key for response (on lectura): cs645_18 Ximing's presentation
Mar 26	Tasneem	D. Blei and J. Lafferty, Topic Models, In A. Srivastava and M. Sahami, editors, Text Mining: Theory and Applications. Taylor and Francis, in press. Turnin key for response (on lectura): cs645_19
Mar 31	Federico	Christopher D. Manning and Hinrich Schutze, Probabilistic Context Free Grammars, Foundations of Statistical Natural Language Processing, Ch. 11, pp 381-405, 1999, The MIT Press. Turnin key for response (on lectura): cs645_20 Federico's presentation
April 02	Joseph Schlecht (Guest lecture)	No response required.
Apr 07	Steve	L. Zhu, Y. Chao, and A.L. Yuille, "Unsupervised Learning Probabilistic Grammar-Markov Model for Objects and Object-Classes," Submitted to PAMI. Turnin key for response (on lectura): cs645_22 Steve's presentation
April 09	Kobus	Some tips on writing No response required
April 14	Jon	Luke S. Zettlemoyer and Michael Collins Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorical Grammars , UAI, 2005. Turnin key for response (on lectura): cs645_24
April 16	Kobus	Writing tips continued No response required
April 21	Farnaz	Tim Oates, Shailesh Doshi and Fang Huang, "Estimating Maximum Likelihood Parameters for Stochastic Context-Free Graph Grammars," LNCS, 2835, pp. 281--298, 2003. and Sourav Mukherjee and Tim Oates, "Estimating Graph Parameters using Graph Grammars," submitted to ICML 09. Turnin key for response (on lectura): cs645_26
April 23	Kobus	Words and pictures No response required. But don't forget, your paper is due!
April 23	Paper draft	Your paper draft will be relatively complete except for final results. It will have some preliminary results. It will be reviewed by your peers. If your paper is weak, you will not necessarily get a bad grade, provided that you revise and improve your paper based on the feedback! Turnin key for your paper draft (on lectura): cs645_paper_draft
April 28	Group (Kobus will lead)	Adrian Barbu and Song-Chun Zhu, "Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities," IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1239-1253, 2005. For those who want to take a look at the original source (optional), it is: R.H. Swendsen and J.S. Wang, "Nonuniversal critical dynamics in MC simulations," Phys. Rev. Lett., 58(2), pp. 86-88 1985. Turnin key for response (on lectura): cs645_28
April 30	Kobus	Words and Pictures continued. No response required, but remember, peer review is due!
April 30	Paper draft peer review	Assignments are in: lectura:/local/cs645/paper_review_assignments/${USER} Please put the user name of the person you are reviewing as a prefix to your file name as in: kobus_paper_review.txt Turnin key for reviews (on lectura): cs645_paper_reviews
May 05 (Last class)	Kobus	Words and Pictures continued. No response required.
May 19, 11 AM (MORNING!), absolute final time.	Complete paper	In addition to overall quality, your final paper will be a partly evaluated with respect to where you were coming into the course, and how well you responded to feedback. Turnin key for reviews (on lectura): cs645_paper
Future	Some topics we did not get to	Non-reversible MCMC (done in UQG) ( Neal, 2004 ) Short-cut MH ( Neal 2005 ) Cross entropy (e.g., this tutorial) Genetic programming (e.g., this field guide) No free lunch theorems (e.g., this paper)