Unless otherwise specified, questions have unit value. The total value of the
  assignments from each week will vary substantively.
  
  Recall that assignments are graded rather loosely on effort, and that 3/4 of
  the total marks (1/2 for ugrads) over all assignments over all weeks
  represents 100%. This policy is in place partly to allow for error in the
  grading approach which, by necessity, is somewhat subjective, and needs to be
  done somewhat superficially. It is recommended (and requested) that you try to
  overshoot the 3/4 requirement, rather than worry about the details of how the
  grading is done. 

  Problems denoted EXTRA can be substituted for other problems, or done in
  addition, but they do not count towards the computation of the 3/4
  requirement. They may be discussed in class depending on time and interest.
  They are problems that I think might be useful, and likely be assigned if we
  had more time per chapter.
  
  Sometimes you will explicitly have to choose some of your own problems. Even
  when this is not the case, you can substitute some problems in the book for
  non-programming assignments if they appear more helpful to you. For now, limit
  the number of substitutions to 50% of what you hand in. This parameter may be
  increased or decreased as we go on.  
  
  You are encouraged to discuss the problems with your peers, but I would like
  individual final submissions demonstrating effort and understanding of what
  was done. If you end up working closely with someone on a problem set, make a
  note on your submission saying who it was. For programming assignments, each
  person should turn in their own work. 

  Since this is graduate level research course that is graded predominately on
  effort, I am confident that there will not be any problems with academic
  honesty. However, do note that non-negligible deviations are often
  surprisingly easy to spot, and can be verified by discussing the submitted
  solutions with the student. 
  
  -----------------------------------------------------------------------------

  Problems for Week 8.
  
  Total value is 9. 
  
  Due Thursday. It is possible that we will discuss some of the first ones in
  Tuesday's class so it is appreciated that the majority of the class has begun
  to think about them by Tuesday. 

  -----------------------------------------------------------------------------

  1. First, if your memory on eigenvectors is hazy, review the material in
     appendix C, starting at the bottom of page 698 (nothing needs to be handed
     in). Second, some of the development of PCA makes better sense if you
     realize that the eigenvalues of the covariance matrix are 1) real; and 2)
     non-negative.  Explain or show why this is the case.

  2. Fill in the details to get 12.5. 

  3. Explain how a PCA based model can be viewed as a generative model. Consider
     if this interpretation provides any intuition as to when PCA makes sense.
     For example, consider the case where a GMM with 3 clusters is a very good
     model. Intuitively, would PCA be just as good?  

  4. (The following problem description is long, but the solutions is not). 
  
     Getting eigenvalues/eigenvectors of a matrix A is referred to as the
     eigenvector decomposition of A (A = V*D*V'). Another important matrix
     factorization is singular value decomposition (SVD). Here A=U*S*V' where S
     is a diagonal matrix of singular values, and U and V are orthogonal.
     Importantly, A need not be square, and the SVD always exists. SVD is
     available in Matlab as svd().

     Lets put the N data points into rows of a data matrix D. For convenience,
     assume that the mean of the data has been subtracted, so that the mean of
     the data is zero. Then, in the PCA development, we use the eigenvectors
     from the eigenvector decomposition of the covariance matrix, C=(1/N)*D'*D.
  
     Consider the SVD of the matrix D, and plug it into the covariance formula
     to yield and alternative way to get what is needed for PCA. (This is useful
     because SVD is a very stable computation). 

     Having got this far, I suggest using Matlab to generate a random matrix D,
     and verifying the above. In general, Matlab is a great tool for verifying
     your understanding of the math. 

  5. Value is FIVE
  
     For this problem, you will implement some aspects of PCA. While there may
     exist software to do some of these tasks, for the purposes of this
     assignment, you should implement things at the matrix level. Of course, you
     can make use of facilities like eigenvalue solvers (e.g. Matlab eig() is
     OK). 
  
     It is common to make the data have zero mean before doing PCA (some sources
     even assume that this is part of PCA), especially when the data is all
     positive as is the case here. To promote consistency among solutions, lets
     work with data that has zero mean (i.e. subtract the mean from any data
     that you analyze with PCA). 

     I have generated face and non-face data similar to that for assignment 3.
     except that I have broken the images into a 12 x 12 grid (not 7 by 7).
     Specifically, the 144 numbers refer to block averages of gray values, taken
     one row at a time. 

         Some face data is linked here. 
         Some non-face data is linked here. 

     Do PCA on the face data. Make an image out of the first eigenvector. (This
     is similar in spirit to figure 12.3). Is it reasonable? Do the same for the
     non-face data. Are the images noticeably different?

     (Depending on your background, creating images will either be trivial,
     worth learning about, or not worth the trouble). 

     Combine the two data sets, and do PCA. Plot the eigenvalues in decreasing
     size. If you had to choose a good number of eigenvalues to capture the
     data, what would it be? 

     Compute the distortion as developed in section 12.1.2 for your chosen
     number of eigenvalues. Verify computationally that the distortion is also
     available from a sum of eigenvalues (i.e., equation 12.18). 

     Plot the projections of the face and non-face points onto the two
     dimensional subspace defined by the first two eigenvectors. The face and
     non-face projected points should be distinguishable (e.g. different color).
     Does this suggest a method for classification? 

     (While arguably this is a pretty naive approach, it did get some notoriety
     in the early 90's. As usual, Wikipedia knows all: see  this link.).