CSc 645

Advanced Topics in Algorithms (Spring 2007)
Fundamentals of machine learning and statistical data analysis

Quick Link to Class Schedule


Time and Place TR 3:30-4:45, Gould-Simpson 701
Description Data from real world processes is not random; rather it has structure that can be characterized and exploited. Automated methods for doing so are critical for much current research including sub-fields of artificial intelligence such as computer vision and scientific data mining and analysis which is particularly common in computational biology. This course will provide a modern grounding of these methods for those intending to use them in research. It will be structured as guided reading together with discussion and group problem solving.
Topics Review of probability, overview of statistical models, principle components analysis, independent components analysis, regression, graphical models, clustering, expectation maximization, hidden Markov models, Markov chain Monte Carlo methods, Metropolis-Hastings, particle filters, learning theory, support vector machines, kernel methods, neural networks.
Prerequisites There are no specific prerequisites for this course other than the normal qualifications required to take graduate level computer science courses. However, the material is quite mathematical, and students must be prepared to struggle with it.
Instructor Kobus Barnard
Office Hours TRF, 9:30-10:00, by automated sign up.
Required Text Most material will be drawn from: Pattern recognition and machine learning, by Christopher Bishop (required text).

Link to text book home page

Other useful references Naturally, the web is very useful for looking up isoloated concepts and finding current research in the areas that we are studying. In addition, the following books are recomended:

  1. Trevor Hasti, Robert Tibshirani, and Jerome Friedman, "The Elements of Statistical Learning," Springer, 2001.
  2. Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin, "Bayesian Data Analysis" Second Edition, Chapman & Hall/CRC, 2004.
  3. Thomas M. Cover and Joy A. Thomas, "Elements of Information Theory," John Wiley & Sons, Inc., 1991.
Format The format of this class is unusual. Please read carefully.

This course strongly emphasizes independent study. In order to optimize learning for a wide range of backgrounds it will be assumed that students will take charge of their own learning of the material. In particular, students will need to make choices regarding which details in the readings that they want to learn, and which ones are best saved for another time.

Grading will largely be based on intelligently applied effort.

The major activity of the course will be working through much of the text. If needed, links to supplementary papers will be provided. The volume of required reading will be substantive.

Except for the week of Jan 16 (which will be a bit compressed), reading assignments will be available at least one week in advance. Before class on Tuesdays, students will need to hand in in a short (e.g. 1/2 to 1 page) personalized summary of the readings. Ideally the summary will go beyond a simple set of "notes" in some way. The summary must be in sentences and paragraphs (as needed)---point form is not acceptable. Please proof read and spell check them. The summary should demonstrate some engagement with the material. Additional comments regarding the scope of what might appear in these summaries will be provided during the first class. The overall requirement over the course will allow for the equivalent of skipping two summaries, assuming a superlative job on the others.

Except for the week of Jan 16 (handled by the instructor), the class sessions for Tuesday and Thursday will be handled by up to three students with pairs likely being most effective. Students who wish to choose their own partners will need to do so relatively early in the semester. Each student will take part in leading between two and four classes (depending on final enrollment).

On Tuesday, the class sessions will focus on discussion related to the reading. The presenters will be encouraged to assume that the class has done the reading, has submitted summaries, and hence everyone has something to say. The presenters will have a large degree of latitude on how they handle the session. Further details regarding possible approaches will be discussed in the first class and will evolve as different groups try different things. However, it may help to consider that a main educational goal of these presentations is to help students learn how to engage others in discussion, which is more difficult than simply giving prepared presentations.

Each reading assignment will have some number of suggested exercises and/or programming assignments. These are due by class time on Thursday. The exercises will be graded very coarsely based on effort. The overall requirement over the course of the semester will be equivalent to putting sufficient effort towards 3/4 of the exercises (1/2 for ugrads). Hence there is some latitude for skipping ones that do not appear useful, or are completely impossible, and to allow for travel and sickness.

On Thursday, the class sessions will focus on the exercises and programming assignments. Additional discussion of the material may or may not also occur.

For each class, the instructors for the week will prepare hard copy of a list of topics or problems (depending on the day) suitable for those attending to indicate the level of emphasis that they would like to see on each topic or problem. This will help deal with the fact that all the material cannot possibly be discussed in depth in the time allotted.

The instructors for the week will also be responsible to work with the course instructor to grade the problems (on effort), and develop a file that has selected solutions and comments regarding the problems, as a function of what issues were difficult, and what was discussed in class.

Since the exercises will not be graded for correctness, verifying one's answers will have to be achieved by a combination of class discussion and studying the posted solutions.

The instructor will optionally take some time at the end of sessions to provide additional summarizing comments on the topics for the week. This will be a function of what has transpired during the session.

Students will be expected to attend class and participate in class discussion and group problem solving. The instructor will record attendance and participation over the course of the semester. Consistent participation over the course of the semester is necessary for good participation credit.

Grading

Due to a very wide range of backgrounds, and no TA support, grading will largely be based on intelligently applied effort in the following four areas.

  • Summaries (20%) [ The two worst efforts will be ignored ]
  • Homework (30%) [ Out of 3/4 maximum possible effort (1/2 for ugrads) ]
  • Presentations (20%)
  • Participation (30%)
  • A cumulative percentage of 90% guarantees an A, 80% guarantees a B, 70% a C, and 60% a D.

    Turnin

    Summary and problem submission need to be done with the turnin program. To use the turnin program, you must log onto the machine lec.cs.arizona.edu, as well as copy your files to a directory visible to that machine. To get an account on the CS machines (include lec), you need to use the "apply" process accessible from this link. Information about the turnin program is available by logging onto lec, and entering:

        man turnin
    
    Files submitted for the summaries and generic problems must in PDF and follow the naming convention:
       summary_NN.pdf
       problems_NN.pdf
       
    Where NN is the week number, with the number of the week of Jan 16 being 01. The turnin name for week NN is cs645_NN. For example, to turn in the first summary, use:
        turnin cs645_01 summary_01.pdf
    

    Some assignments will have other requirements for submission. For example, submitting computer code or images may be needed. Further instructions regarding submission formats will be made available at the time.

    As discussed in class, handwritten assignments turned into the instructor's mail-box are an acceptable alternative to PDF's if electronic submission is inconvenient due to the amount of math. However, in this case, you need to submit a PDF with the correct file name, with your name, computer ID, assignment number, and a sentence saying:

        Submitted as hard-copy. 
    
    This will help the instructor keep track of what is submitted.

    Regardless of whether files are submitted electronically please clearly identify the author, the author's CS login ID, and what the document is for (e.g. CS 645 problems for week NN).

    Policies Good attendance is required. If you cannot make class due to travel or sickness, please let the instructor know, as missed classes can count against the participation grade. The degree of impact will increase significantly after more than three unexplained missed classes.

    Homework and summaries need to be handed in at their appointed times to collect credit. Note that there is already sufficient "built in" leeway to account for a typical amount of missed school. Students are advised to overshoot the requirements so that if skipping a week becomes necessary, it will not affect their cumulative percentage.


    Class Schedule (was here, now linked)