Based on feedback, we are keeping the new format. Further, we will experiment
  with having only one deadline, which, for this assignment, is Thursday.  No
  summaries are required for this week as the programming assignment is a bit
  longer than others. 
  
  NOTE A FEW CHANGES IN THE FOLLOWING PREAMBLE
  

  -----------------------------------------------------------------------------

  Unless otherwise specified, questions have unit value. The total value of the
  assignments from each week will vary substantively.
  
  Recall that assignments are graded rather loosely on effort, and that 3/4 of
  the total marks (1/2 for ugrads) over all assignments over all weeks
  represents 100%. This policy is in place partly to allow for error in the
  grading approach which, by necessity, is somewhat subjective, and needs to be
  done somewhat superficially. It is recommended (and requested) that you try to
  overshoot the 3/4 requirement, rather than worry about the details of how the
  grading is done. 

  Problems denoted EXTRA can be substituted for other problems, or done in
  addition, but they do not count towards the computation of the 3/4
  requirement. They may be discussed in class depending on time and interest.
  They are problems that I think might be useful, and likely be assigned if we
  had more time per chapter.
  
  Sometimes you will explicitly have to choose some of your own problems. Even
  when this is not the case, you can substitute some problems in the book for
  non-programming assignments if they appear more helpful to you. For now, limit
  the number of substitutions to 50% of what you hand in. This parameter may be
  increased or decreased as we go on.  
  
  You are encouraged to discuss the problems with your peers, but I would like
  individual final submissions demonstrating effort and understanding of what
  was done. If you end up working closely with someone on a problem set, make a
  note on your submission saying who it was. For programming assignments, each
  person should turn in their own work. 

  Since this is graduate level research course that is graded predominately on
  effort, I am confident that there will not be any problems with academic
  honesty. However, do note that non-negligible deviations are often
  surprisingly easy to spot, and can be verified by discussing the submitted
  solutions with the student. 
  
  -----------------------------------------------------------------------------

  Problems for Week 5.
  
  Total value is 11. 
  
  Due Thursday. It is possible that we will discuss the first 3 in Tuesday's
  class, so it is appreciated that the majority of the class has begun to think
  about them by Tuesday. 

  -----------------------------------------------------------------------------

  DOUBLE value. 
  1. Propose a problem where the concept of a missing variable is a useful way
     to think about the problem. There are many good ones. Try to think of one
     that is not simply clustering or GMM in disguise. Explain a bit your model.
     Try to draw a graphical representation for it. 

  DOUBLE value. 
  2. Fill in some of the details in the derivations of 9.17, 9.19, and 9.22. 

  3. 9.4. This tweaks EM to include a prior over parameters. 

  Value is SIX. 

  4. Implement EM for the Gaussian mixture model in the case of 
     diagonal covariance where the variance of each feature (here, X and Y) is
     the same for each clusters. IE, we are assuming that the clusters are
     Gaussian "balls" of various sizes. Make the number of clusters, K, easy to
     set.  You can deal with the problem described on page 434 by adding a
     small, fixed, amount of variance to the estimated variance at each
     iteration.  Your program should output the log-likelihood at each
     iteration. It must go up. If it does not always go up, then you have a
     bug.

     Use your program to cluster the 2D data linked here.

     Begin by plotting these points. First state how many clusters you think are
     reasonable. Then try to fit that many clusters.

     Initialization is always a problem with EM. You may want to experiment a
     bit. One method that should work on this data is to randomly assign each
     point to a cluster.

     You also want to make sure that you try to use sufficient iterations so
     that you are close to convergence, at least until you understand the
     problem (then you might back off the number of iterations to reduce the
     chances of over-fitting). For this data, I suggest that you push it towards
     100 iterations. This helps check that the log-likelihood always goes up. 

     Your program should assign a cluster label to each point based on the
     cluster that has maximum responsibility for explaining the point. In
     other words, the cluster that yields highest probability for that point.
     Your program should plot the points with different colors (or symbols) for
     each cluster assignment (see Figure 9.5, but no need to blend the colors
     unless you think it is fun to do). Do you think that the program generally
     works? Provide a plot of an example where it is working (unless you find
     that it never does). 

     Now try different numbers of clusters. If you increase the number of
     clusters sufficiently, you may find that not every cluster has a
     representative point. In such a case, you would have fewer colors/symbols
     than clusters. Regardless of whether you observe it, explain why this is
     possible.