CSc 645


Advanced Topics in Algorithm Analysis (Spring 09)
Statistical modeling and inference

Quick Link to Class Schedule


Instructor Kobus Barnard
Office Hours TBA, by automated sign up.
Time and Place TR 3:30-4:45, Gould-Simpson 934
Description This course will explore current research relevant to using parametrized statistical models to extract meaning from complex data such as image data of objects, organs, and organisms. Such models can effectively encode important aspects of many problems while capturing variance across instances and measurement processes, thereby explaining observed data. Importantly, models can be fit to data using statistical inference in the Bayesian framework, and further, models for a category can be learned from groups of instances of the category. Learned category models can be fit to particular instances of the data to characterize them and/or identify the category that the instance belongs to.

As a concrete example, consider building a statistical geometric model for the form of a biological entity such as a particular plant species at a given development stage. The general form of such a model can be specified with the help of domain scientists, and the details of such a model can be inferred from images of plants from the species under study. Models learned in this way can be fit to image data of new instances. If the images are of specimens grown under a particular environmental condition, the process can be be used to gather quantitative information about the form of the plant under that condition. Thus one could, for example, extract quantitative data about the effect of drought on the geometric structure of that plant, and link that information to other data such as genetic data and crop yield.

The course is an advanced, research focused seminar course targeted at students who want to integrate this methodology into their research. The material will have a strong emphasis on the key technical issue of efficiently inferring good values for model parameters from real data. This generally becomes an optimization problem that can be addressed by advanced sampling and other methods. A significant fraction of the course time will be spent studying literature that suggests strategies for optimizing the kinds of functions that arise in this problem domain.

Topics (UNDER CONSTRUCTION) Modeling and representation: Graphical model notation, Bayesian networks, Bayesian statistics, stochastic grammar models, and image formation. Inference and optimization: Modern approaches to optimizing resulting posterior functions such as MCMC methods (e.g., Metropolis Hastings, reversible jump) and methods based on statistical physics such as Langevin sampling and accelerated molecular dynamics, with a focus on how our understanding of the construction of these functions can be exploited to make these methods effective.
Frequently asked question What is the relation of this topic to machine learning and the 2007 six hundred level course on the fundamentals of machine learning? This course can be considered a special topic in machine learning, and the area can be considered a subset of machine learning, broadly defined. The approach studied in this course is an increasingly popular way to set up learning from data in order to predict future events or classify new examples. However, there are many other useful techniques that are included in a more general machine learning syllabus. For example, discriminative methods can be very successful at classification without a good underlying model. However, in this course, we will focus on applications where models with good representation in the problem domain are arguably worth pursuing.
Prerequisites There are no formal prerequisites for this course other than the normal qualifications required to take graduate level computer science courses. However, the material is very mathematical, and students must be prepared to struggle with it and augment their reading with further study of relevant background material. Also, as mentioned above, the course is targeted towards students who intend to integrate the material into their research, and thus it will proceed at a relatively mature intellectual level, reflecting the overall goals of a research oriented course in an advanced topic area.
Required Text There are no required texts. All readings will be made available on-line.

Students interested in a general text that puts the approach in perspective might consider consulting "Pattern Recognition and Machine Learning", by Christopher Bishop.

Format This course is research focused. Students will be expected to read a number of current papers, present and lead the discussion for several of them, and do a project. Research oriented projects will be strongly encouraged. Group projects will be possible. Short presentations at several project stages will be required from each project group.

Before the day when specific papers will be presented, students will provide the presenter modest feedback on the reading material. This can include thoughts, summaries, comments, notes, worked examples, programming examples, and questions for discussion. The exact nature of the feedback is flexible, and should be chosen by the student to serve two purposes: 1) increase ones understanding of the material; and 2) provide material that help the presenters engage the class with the material.

Grading

Grading will be based on performance in the following four areas.

  • Weekly feedback (20%)
  • Participation (20%)
  • Presentations (20%)
  • Project work (40%)
  • A cumulative percentage of 90% guarantees an A, 80% guarantees a B, 70% a C, and 60% a D.

    Policies Good attendance is required. If you cannot make class due to travel or sickness, please let the instructor know, as missed classes can count against the participation grade. The degree of impact will be significant if there are more than three unexplained missed classes.

    Students will be expected to respect the University of Arizona's academic integrity policy.


    Class Schedule