Based on feedback, we are keeping the new format. Further, we will experiment
  with having only one deadline, which, for this assignment, is Thursday.  No
  summaries are required for this week as the programming assignment is a bit
  longer than others. 
  
  NOTE A FEW CHANGES IN THE FOLLOWING PREAMBLE (from week 4). 
  
  -----------------------------------------------------------------------------
  Unless otherwise specified, questions have unit value. The total value of the
  assignments from each week will vary substantively.
  
  Recall that assignments are graded rather loosely on effort, and that 3/4 of
  the total marks (1/2 for ugrads) over all assignments over all weeks
  represents 100%. This policy is in place partly to allow for error in the
  grading approach which, by necessity, is somewhat subjective, and needs to be
  done somewhat superficially. It is recommended (and requested) that you try to
  overshoot the 3/4 requirement, rather than worry about the details of how the
  grading is done. 
  Problems denoted EXTRA can be substituted for other problems, or done in
  addition, but they do not count towards the computation of the 3/4
  requirement. They may be discussed in class depending on time and interest.
  They are problems that I think might be useful, and likely be assigned if we
  had more time per chapter.
  
  Sometimes you will explicitly have to choose some of your own problems. Even
  when this is not the case, you can substitute some problems in the book for
  non-programming assignments if they appear more helpful to you. For now, limit
  the number of substitutions to 50% of what you hand in. This parameter may be
  increased or decreased as we go on.  
  
  You are encouraged to discuss the problems with your peers, but I would like
  individual final submissions demonstrating effort and understanding of what
  was done. If you end up working closely with someone on a problem set, make a
  note on your submission saying who it was. For programming assignments, each
  person should turn in their own work. 
  Since this is graduate level research course that is graded predominately on
  effort, I am confident that there will not be any problems with academic
  honesty. However, do note that non-negligible deviations are often
  surprisingly easy to spot, and can be verified by discussing the submitted
  solutions with the student. 
  
  -----------------------------------------------------------------------------
  Problems for Week 6.
  
  Total value is 10. 
  
  Due Thursday. It is possible that we will discuss some of the first 4 in
  Tuesday's class so it is appreciated that the majority of the class has begun
  to think about them by Tuesday. 
  -----------------------------------------------------------------------------
  1. Propose an application of an HMM like model where the states have well 
     defined interpretations. Specify what the states and state emission
     probabilities might look like, if you were to model it. Describe how you
     would train the model, if you were able to label the states that occur in
     the training data observations. 
  2. Propose an application a HMM like model where the states have no obvious 
     interpretation, and are thus truly hidden. Specify what the details of the
     model might look like. Describe how you would train the model. Propose a
     strategy if you were unsure about how many states to use, but had lots of
     training data. 
  3. Provide some of the details to derive 13.17. Hint: Use the indicator
     variable as exponents formalism. 
  4. Explain Viterbi (max-sum for HMM) intuitively, perhaps referring to 13.68.
     It looks to me that 13.68 has a typo in it. Do you agree, and if so, what
     is it? What is the different between what Viterbi is good for and what the
     alpha-beta (or forward-backward, or Baum-Welch) algorithm is good for? 
  Value 6: 
  5. Consider four people having a conversation: Ann, Brigit, Chris, and Doug.
     You have a device that can detect when the end of a sentence occurs, and
     average pitch of a sentence, but it can only categorize pitch into "high"
     and "low". The posterior probabilities of measuring "high" pitch for a
     sentence spoken, conditioned on the speaker, is as follows:
        Ann:    0.9
        Brigit: 0.8
        Chris:  0.0
        Doug:   0.1
     Suppose that your device records the following sequence start 
     (0=Low, 1=High) 
      
        1-1-0-0 
     and that each person is equally likely to be talking. What is your MLE of
     the speaker sequence? 
     Now, suppose you have more information: 
        Ann never starts a conversation. And she never response to women. Once
        she is talking there is a 40% the next sentence will be her's; otherwise
        it is equally likely that the next sentence will belong to any of the
        other three.
        
        Doug also never starts a conversation. The other two are equally likely
        to start a conversation. 
        Brigit likes to counter Chris (50% on per sentence basis), and talks a
        lot. In fact, there is 80% chance that one of her sentences will be
        followed by another one by herself. 
        Doug likes to counter Brigit.  Brigit's sentences will be followed by
        one of Doug's with 20% probability. No one else has the courage to speak
        after Brigit. 
        Regardless of when or why, Doug always speaks in one sentence smart-ass
        comments. He then waits for someone else to speak. 
        Ann likes Doug's one-liners. Doug is followed by Ann 70% of the time,
        but never by Brigit. 
        Chris also likes the sound of his own voice, and 50% of the time, one of
        his sentences is followed by another from himself. He responds to Doug
        the 30% of the time that he can think of something to say faster than
        Ann can. 
    Would this change your MLE estimate of the speaker sequence? Explain a bit
    (perhaps give some examples). 
    Draw a transition diagram (no need to hand it in---save time by just doing
    it on a piece of paper) and transition matrix for the model (hand this in). 
    Use the Viterbi algorithm to determine the MLE of the speaker sequence given
    the observation sequence linked here. 
  EXTRA
  7. PRML problem 13.5.