When | Who Presents / What is Due | Description |
|
Kobus | Introductory Lecture |
|
Kobus |
Chapter one of Pattern Recognition and Machine Learning by Chris Bishop,
up to and including 1.54. Details in section 1.2.6 are less critical, but the
meaning of equation 1.68 is important.
Key concepts that you should be working towards understanding are generative statistical models, and the basics of Bayesian statistics. Those who need to brush up of probability may also want to have a look at the following introduction to probability . Alternatively, you could consult your favorite probability text, or Wikipedia. |
|
Kobus |
8.1 and 8.2 of
chapter eight of
Pattern Recognition and Machine Learning by Chris Bishop.
Section 8.1.4 is perhaps less important if this is your first time through such
material.
A key underlying concept that you should make sure you understand is conditional independence. If you are surfing the web for additional material, keywords include graphical models, Bayes nets, and conditional independence. I recommend checking out some of these videos here , with perhaps this one being the most useful (I have only watched the first few minutes of about 1/2 of them.) |
|
Ernesto |
8.3 through 8.4.3 of
chapter eight of
Pattern Recognition and Machine Learning by Chris Bishop.
If you are getting bored of Bishop, another introduction is linked here. Turnin key for response (on lectura): cs645_04 |
|
Andrew |
8.4.4 through the end of
chapter eight of
Pattern Recognition and Machine Learning by Chris Bishop.
If you are getting bored of Bishop, another introduction is linked here. Turnin key for response (on lectura): cs645_05 |
|
Maria |
Here we will study Gaussian mixture models (GMM's) and expectation maximization
(EM). GMM's are the canonical statistical clustering model. Hence if you are
thinking of clustering for your project, then you will want to study this topic
carefully. The basic model is relatively easy to understand, and using EM for
inference on it is also relatively straightforward. However, the reading
includes a lot of background theory. If you are are not already familiar with
this topic, then your goal should be to use the reading and/or other material
on the web to understand:
The reading is from chapter nine of Pattern Recognition and Machine Learning by Chris Bishop. I expect that the first paragraph to be a bit obscure for many---you can ignore it---it makes sense once you understand the rest of the material. Focus on 9.1 and 9.2, if this is your first exposure to the topic. Sections 9.3 through the end of 9.3.2, and 9.3 is also worth a look. Turnin key for response (on lectura): cs645_06 |
|
Raquel |
Here we will study hidden Markov models (HMM's). This is an important model, and
is a good one to consider for projects that involve sequence data that is
represented as a sequence of state changes. For example, HMM's are often used
for tasks like gesture or action recognition.
Like GMM's, HMM's are common, well understood, and there is much additional material about them on the web. We will keep things simple and use Bishop for common ground. (This will be the LAST reading from Bishop). The reading is up to 13.2.5 of chapter thirteen of Pattern Recognition and Machine Learning by Chris Bishop. Some parts, such as 13.2.4 are less important and should be skimmed, leaving time to get the main points. It is important to learn: Another reference for HMM's is Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, 77 (2), p. 257-286, February 1989. Turnin key for response (on lectura): cs645_07 |
|
Ray |
We will now move onto MCMC sampling. This is the start on a bunch of papers on
inference. This is also a good source of projects. In particular, you can create
a project, or extend the depth of any project, by proposing to study and compare
several inference approaches on real data. For example, EM vs MH vs HMC. (At
this point you may only know what the first acronym stands for).
There is much material available on MCMC sampling including chapter 11 of Bishop, and Radford Neal's review which is very good, and you might want to read it in parallel with other sources. However, for common ground, we will use C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. The key material to prepare for this class is section 3 up to the end of section 3.5, with the key thing to learn is the Metropolis-Hastings (MH) algorithm (and the special case of Gibbs Sampling). You might find parts of the paper before section 3 useful; read it as needed, but not at the expense of understanding MH. This paper goes on to discuss Monte Carlo EM (worthwhile), and then Hybrid Monte Carlo and Reversible jump, which are topics that we will do in more detail soon. But I recommend reading the treatment in this paper to begin preparing for these future classes. To repeat: If you are new to this topic, the critical part is the Metropolis-Hastings algorithm. Please make sure you understand it. You may find it interesting that the MH algorithm was proposed as one of the top ten algorithms in scientific computing in this article . The blurb on MH is linked here. Turnin key for response (on lectura): cs645_08 |
|
Swami |
Hybrid Monte-Carlo (HMC).
HMC has proven to be quite effective in one of the vision lab's projects. It is worth studying. The mapping to physical systems might seem confusing, but it is worth getting used to it---it is mostly notation. Start with section 3.6 of C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. Then read Neal, R. M. (1994) "An improved acceptance procedure for the hybrid Monte Carlo algorithm", Journal of Computational Physics, vol. 111, pp. 194-203. This paper provides a nice intro to HMC and an extension that might be interesting to experiment with. Page 7 onwards is not so important. A fun applet that Swami found showing some MCMC methods. Note however that intuition gained here may not apply to difficult high dimensional search spaces. Turnin key for response (on lectura): cs645_09 |
|
Tentative project selection |
By this date, you need to inform Kobus of your project plans. Most likely you
will either have chosen your project long before this time or you will have been
undergoing intense discussion with Kobus.
You need to hand in a text file containing a few sentences declaring what your project is. In particular, you need to give me a rough idea what the generative statistical model will be---enough so I can be sure that the proposal due in a few weeks will in fact have a generative statistical model in it. If you are working with a partner, let me know who it is. Turnin key for a few sentence on project topic: cs645_topic |
|
Qiyam |
Reversible Jump MCMC.
RJMCMC is a canonical way to handle the very important issue of model selection
where the number of parameters is not known (often the case).
Start with section 3.7 of C. Andrieu, N. d. Freitas, A. Doucet, and M. I. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003. Figure 17 is very useful, as is the example at the end of the section. The material in 3.7 serves as a good summary this topic, and if you can master that, you are in good shape. Then read Green, P. (2003). Trans-dimensional Markov chain Monte Carlo, in Highly Structured Stochastic Systems. The main focus will be section 2, which overlaps with the summary in Andrieu et al. The other material is further discussion on trans-dimensional sampling, and is less important, especially if you are new to sampling. However, using sampling for model selection is an important and difficult area, and some discussion about the larger issues in class will hopefully occur, so please give them some thought. Ideally, we will get collectively understand of the basic why and how of Green's method, and some general understanding of the larger issues. I think this supersedes the original paper (which is covered in section two and put into more context in the 2003 paper. But in case it useful, the original is Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrica 82, 711--732.
Things that you want to get comfortable with:
Turnin key for response (on lectura): cs645_10 |
|
Kobus |
The format for this session will be a bit different. We will have not have a
slide presentation. Rather we will focus on discussion on MCMC methods,
solidifying what we have read so far. To provide some bases for discussion, have
a look at the two papers listed below which apply MCMC. We will not worry so
much about their technical details. Just focus on the main points: What is the
problem, what is approach to solve it, and what might be strengths and
weakness of the solution. For example, in the first paper, the main
contribution is two technical results relating to complexity and convergence.
You can ignore them if you like. For us, the paper simply serves as a good
introduction to using MCMC for multi-target tracking.
For each paper I have listed a discussion topic that came to mind that we will discuss. I have my opinions about them, but I want to know what you think. In addition, I will be calling on the audience to help summarize bits of the papers.
Paper one:
(Reference 20, linked here is interesting, if a bit dense. An additional paper from the same group (linked here ) solves one of the "open" questions listed in reference 20. A Google search reveals some some additional recent papers that are also related. This is not an area I know much about. However, if someone looks at this material and thinks it is worth doing some of it as part of the class, we could consider it.) Discussion question: The number of tracks is not known. Yet, reversible jump is not used. Why is it not needed? If not, does clustering permit an analogous interpretation? Is thinking about clustering in this way useful?
Paper two:
Discussion question: The prior over the number of clusters is uniform. Does that make sense? Why is the "answer" not simply the maximal number of clusters in all cases? Is there a flaw in this paper? Turnin key for response (on lectura): cs645_11 |
|
Ram |
Data-driven MCMC (part one).
In many problems, such as those in computer vision, MCMC can give good results because of problem structure, and, given the right representation, a global maximum dominates the space. However, searching, or even getting a chain started that is reasonable, is much improved by peeking at the data. Conditioning of the data is called data-driven MCMC. The concept has been around for a while. But it seems that it got named and claimed in the following paper: Z.W. Tu and S.C. Zhu, Image Segmentation by Data-Driven Markov Chain Monte Carlo, IEEE Trans on Pattern Analysis and Machine Intelligence, vol.24, no.5, pp. 657-673, May, 2002. Unfortunately, I do not know of a suitable stand-alone paper focused on data-driven methods; the method is relatively application dependent. With the above paper, we will see MCMC being used to integrate many hammers being applied simultaneously to a difficult problem. This paper is quite technical, so we will break it up into two parts, combining the second part with another paper. For the first part we will see the problem being setup with and reversible jump moves being defined. Note that Green's paper is referenced, but these authors do not bother telling you much about how they compute the differential probability ratios. Recall that Green provided a nice way around things through space extension and the determinant of the Jacobian---I do not know the underlying details of what is done in this paper. We will read up to 4.2, and skim 4.3, which sets us up for Thursday. Turnin key for response (on lectura): cs645_12 |
|
Kyle |
Data-driven MCMC (part two).
Read the rest (i.e., section 4.3 onwards) of: Z.W. Tu and S.C. Zhu, Image Segmentation by Data-Driven Markov Chain Monte Carlo, IEEE Trans on Pattern Analysis and Machine Intelligence, vol.24, no.5, pp. 657-673, May, 2002. In parallel, read the first eight page of: R. Maciuca and S.C. Zhu, How to Heuristics Expedite Markov Chain Search, International Workshop on Statistical and Computational Theories of Vision, 2003. Turnin key for response (on lectura): cs645_13 |
noon |
Project proposal |
The project proposal is a substantive document explaining what your problem is,
why it is important, literature context, what your proposed model is, and some
mention (probably not very detailed) of inference. There will be references.
This is a proposal, not a paper. However, the material will be useful for your
paper, provided that you rewrite it for a different audience. It will be
reviewed by your peers.
Turnin key for proposal (on lectura): cs645_proposal |
|
Javad |
The reading for this session is:
Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences. 105(31), 10687-10692. I see that the authors have put the data and code on-line here. (I have not looked at it myself). Turnin key for response (on lectura): cs645_14 |
|
Kobus |
Reviewing and writing for reviewers
No response required. |
11:59 PM. |
Project proposal peer review |
Written reviews need to be submitted.
Your review assignments should appear in:
Turnin key for reviews (on lectura): cs645_proposal_reviews Additional comments on hard copy can be given back in person to the reviews in class on March 10. |
|
Kobus |
Linear algebra for sampling
No response required. |
|
Kobus |
Linear algebra for sampling (continued).
No response required. |
|
Spring Break | No class |
|
Spring Break | No class |
|
Ximing Yu |
Sebastian Thrun, Dieter Fox, Wolfram Burgard and Frank Dellaert,
Robust Monte Carlo localization for mobile robots,
Artificial Intelligence,
Volume 128, Issues 1-2, May 2001, Pages 99-141.
Turnin key for response (on lectura): cs645_18 |
|
Tasneem |
D. Blei and J. Lafferty,
Topic Models,
In A. Srivastava and M. Sahami, editors, Text Mining: Theory and Applications. Taylor and Francis, in press.
Turnin key for response (on lectura): cs645_19 |
|
Federico |
Christopher D. Manning and
Hinrich Schutze,
Probabilistic Context Free Grammars,
Foundations of Statistical Natural Language Processing, Ch. 11,
pp 381-405, 1999, The MIT Press.
Turnin key for response (on lectura): cs645_20 |
|
Joseph Schlecht
(Guest lecture) |
No response required. |
|
Steve |
L. Zhu, Y. Chao, and A.L. Yuille,
"Unsupervised Learning Probabilistic Grammar-Markov Model for Objects and
Object-Classes,"
Submitted to PAMI.
Turnin key for response (on lectura): cs645_22 |
|
Kobus |
Some tips on writing
No response required |
|
Jon |
Luke S. Zettlemoyer and Michael
Collins
Learning to Map Sentences to Logical Form: Structured Classification with
Probabilistic Categorical Grammars
, UAI, 2005.
Turnin key for response (on lectura): cs645_24 |
|
Kobus |
Writing tips continued
No response required |
|
Farnaz |
Tim Oates, Shailesh Doshi and Fang Huang,
"Estimating Maximum
Likelihood Parameters for Stochastic Context-Free Graph Grammars,"
LNCS,
2835, pp. 281--298, 2003.
and Sourav Mukherjee and Tim Oates, "Estimating Graph Parameters using Graph Grammars," submitted to ICML 09. Turnin key for response (on lectura): cs645_26 |
|
Kobus |
Words and pictures No response required. But don't forget, your paper is due! |
|
Paper draft |
Your paper draft will be relatively complete except for final results. It will
have some preliminary results. It will be reviewed by your peers. If your paper
is weak, you will not necessarily get a bad grade, provided that you revise and
improve your paper based on the feedback!
Turnin key for your paper draft (on lectura): cs645_paper_draft |
|
Group (Kobus will lead) |
Adrian Barbu and Song-Chun Zhu,
"Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp.
1239-1253, 2005.
For those who want to take a look at the original source (optional), it is: R.H. Swendsen and J.S. Wang, "Nonuniversal critical dynamics in MC simulations," Phys. Rev. Lett., 58(2), pp. 86-88 1985. Turnin key for response (on lectura): cs645_28 |
|
Kobus |
Words and Pictures continued.
No response required, but remember, peer review is due! |
|
Paper draft peer review |
Assignments are in:
        lectura:/local/cs645/paper_review_assignments/${USER}
Please put the user name of the person you are reviewing as a prefix to your file
name as in:
Turnin key for reviews (on lectura): cs645_paper_reviews
|
(Last class) |
Kobus |
Words and Pictures continued.
No response required. |
|
Complete paper |
In addition to overall quality, your final paper will be a partly evaluated with
respect to where you were coming into the course, and how well you responded to
feedback.
Turnin key for reviews (on lectura): cs645_paper |
|
Some topics we did not get to |
Non-reversible MCMC (done in UQG) ( Neal, 2004 ) Short-cut MH ( Neal 2005 ) Cross entropy (e.g., this tutorial) Genetic programming (e.g., this field guide) No free lunch theorems (e.g., this paper)
|