Fundamentals of machine learning and statistical data analysis
|Time and Place||TR 2:00-3:00p.m, Gould-Simpson 927|
|Description||Data from real world processes is not random; rather it has structure that can be characterized and exploited. Automated methods for doing so are critical for much current research including sub-fields of artificial intelligence such as computer vision and scientific data mining and analysis which is particularly common in computational biology. This course will provide a modern grounding of these methods for those intending to use them in research. It will be structured as guided reading together with discussion and group problem solving.|
|Topics||Review of probability, overview of statistical models, principle components analysis, independent components analysis, regression, graphical models, clustering, expectation maximization, hidden Markov models, Markov chain Monte Carlo methods, Metropolis-Hastings, particle filters, learning theory, support vector machines, kernel methods, neural networks.|
|Prerequisites||There are no specific prerequisites for this course other than the normal qualifications required to take graduate level computer science courses. However, the material is quite mathematical, and students must be prepared to struggle with it.|
|Office Hours||TRF, 9:30-10:00, by automated sign up.|
|Required Text||Most material will be drawn from: Pattern recognition and machine learning, by Christopher Bishop (required text).|
|Other useful references||
Naturally, the web is very useful for looking up isolated concepts and
finding current research in the areas that we are studying. In addition, the
following books are recomended:
The format of this class is unusual. Please read carefully.
This course strongly emphasizes independent study. In order to optimize learning for a wide range of backgrounds it will be assumed that students will take charge of their own learning of the material. In particular, students will need to make choices regarding which details in the readings that they want to learn, and which ones are best saved for another time.
Grading will largely be based on intelligently applied effort.
The major activity of the course will be working through much of the text. If needed, links to supplementary papers will be provided. The volume of required reading will be substantive.
Reading assignments will be available at least one week in advance. Before class on Tuesdays, students will need to hand in a short (e.g. 1/2 to 1 page) personalized summary of the readings. Ideally the summary will go beyond a simple set of "notes" in some way. The summary must be in sentences and paragraphs (as needed)---point form is not acceptable. Please proof read and spell check them. The summary should demonstrate some engagement with the material. Additional comments regarding the scope of what might appear in these summaries will be provided during the first class. The overall requirement over the course will allow for the equivalent of skipping two summaries, assuming a superlative job on the others.
The class sessions for Tuesday and Thursday will be handled by one individual student who will be lead presenter for that week. Each student will take part in leading the discussion in turns.
On Tuesday, the class sessions will focus on discussion related to the reading. The presenters will be encouraged to assume that the class has done the reading, has submitted summaries, and hence everyone has something to say. The presenters will have a large degree of latitude on how they handle the session. Further details regarding possible approaches will be discussed in the first class and will evolve as different groups try different things. However, it may help to consider that a main educational goal of these presentations is to help students learn how to engage others in discussion, which is more difficult than simply giving prepared presentations.
Each reading assignment will have some number of suggested exercises and/or programming assignments. These are due by class time on Thursday. The exercises will be graded very coarsely based on effort. The overall requirement over the course of the semester will be equivalent to putting sufficient effort towards 3/4 of the exercise. Hence there is some latitude for skipping ones that do not appear useful, or are completely impossible, and to allow for travel and sickness.
On Thursday, the class sessions will focus on the exercises and programming assignments. Additional discussion of the material may or may not also occur. Since the exercises will not be graded for correctness, verifying one's answers will have to be achieved by a combination of class discussion and studying the posted solutions.
Students will be expected to attend class and participate in class discussion and group problem solving. The instructor will record attendance and participation over the course of the semester. Consistent participation over the course of the semester is necessary for good participation credit.
Files submitted for the summaries and generic problems
must in PDF and follow the naming convention:
summary_NN.pdf problems_NN.pdfWhere NN is the week number, with the number of the week of Sep 1 being 01. Please email the documents to ranjini* before class on Tuesday and Thursday. Some assignments will have other requirements for submission. For example, submitting computer code or images may be needed. Further instructions regarding submission formats will be made available at the time.
For all submissions, please clearly identify the author, the author's CS login ID, and what the document is for (e.g. CS 699 problems for week NN).
Good attendance is required. If you cannot make class due to travel or sickness,
please let the instructor know, as missed classes can count against the
Homework and summaries need to be handed in at their appointed times to collect credit. Note that there is already sufficient "built in" leeway to account for a typical amount of missed school. Students are advised to overshoot the requirements so that if skipping a week becomes necessary, it will not affect their cumulative percentage.