Based on feedback, we are keeping the new format. Further, we will experiment
with having only one deadline, which, for this assignment, is Thursday. No
summaries are required for this week as the programming assignment is a bit
longer than others.
NOTE A FEW CHANGES IN THE FOLLOWING PREAMBLE (from week 4).
-----------------------------------------------------------------------------
Unless otherwise specified, questions have unit value. The total value of the
assignments from each week will vary substantively.
Recall that assignments are graded rather loosely on effort, and that 3/4 of
the total marks (1/2 for ugrads) over all assignments over all weeks
represents 100%. This policy is in place partly to allow for error in the
grading approach which, by necessity, is somewhat subjective, and needs to be
done somewhat superficially. It is recommended (and requested) that you try to
overshoot the 3/4 requirement, rather than worry about the details of how the
grading is done.
Problems denoted EXTRA can be substituted for other problems, or done in
addition, but they do not count towards the computation of the 3/4
requirement. They may be discussed in class depending on time and interest.
They are problems that I think might be useful, and likely be assigned if we
had more time per chapter.
Sometimes you will explicitly have to choose some of your own problems. Even
when this is not the case, you can substitute some problems in the book for
non-programming assignments if they appear more helpful to you. For now, limit
the number of substitutions to 50% of what you hand in. This parameter may be
increased or decreased as we go on.
You are encouraged to discuss the problems with your peers, but I would like
individual final submissions demonstrating effort and understanding of what
was done. If you end up working closely with someone on a problem set, make a
note on your submission saying who it was. For programming assignments, each
person should turn in their own work.
Since this is graduate level research course that is graded predominately on
effort, I am confident that there will not be any problems with academic
honesty. However, do note that non-negligible deviations are often
surprisingly easy to spot, and can be verified by discussing the submitted
solutions with the student.
-----------------------------------------------------------------------------
Problems for Week 6.
Total value is 10.
Due Thursday. It is possible that we will discuss some of the first 4 in
Tuesday's class so it is appreciated that the majority of the class has begun
to think about them by Tuesday.
-----------------------------------------------------------------------------
1. Propose an application of an HMM like model where the states have well
defined interpretations. Specify what the states and state emission
probabilities might look like, if you were to model it. Describe how you
would train the model, if you were able to label the states that occur in
the training data observations.
2. Propose an application a HMM like model where the states have no obvious
interpretation, and are thus truly hidden. Specify what the details of the
model might look like. Describe how you would train the model. Propose a
strategy if you were unsure about how many states to use, but had lots of
training data.
3. Provide some of the details to derive 13.17. Hint: Use the indicator
variable as exponents formalism.
4. Explain Viterbi (max-sum for HMM) intuitively, perhaps referring to 13.68.
It looks to me that 13.68 has a typo in it. Do you agree, and if so, what
is it? What is the different between what Viterbi is good for and what the
alpha-beta (or forward-backward, or Baum-Welch) algorithm is good for?
Value 6:
5. Consider four people having a conversation: Ann, Brigit, Chris, and Doug.
You have a device that can detect when the end of a sentence occurs, and
average pitch of a sentence, but it can only categorize pitch into "high"
and "low". The posterior probabilities of measuring "high" pitch for a
sentence spoken, conditioned on the speaker, is as follows:
Ann: 0.9
Brigit: 0.8
Chris: 0.0
Doug: 0.1
Suppose that your device records the following sequence start
(0=Low, 1=High)
1-1-0-0
and that each person is equally likely to be talking. What is your MLE of
the speaker sequence?
Now, suppose you have more information:
Ann never starts a conversation. And she never response to women. Once
she is talking there is a 40% the next sentence will be her's; otherwise
it is equally likely that the next sentence will belong to any of the
other three.
Doug also never starts a conversation. The other two are equally likely
to start a conversation.
Brigit likes to counter Chris (50% on per sentence basis), and talks a
lot. In fact, there is 80% chance that one of her sentences will be
followed by another one by herself.
Doug likes to counter Brigit. Brigit's sentences will be followed by
one of Doug's with 20% probability. No one else has the courage to speak
after Brigit.
Regardless of when or why, Doug always speaks in one sentence smart-ass
comments. He then waits for someone else to speak.
Ann likes Doug's one-liners. Doug is followed by Ann 70% of the time,
but never by Brigit.
Chris also likes the sound of his own voice, and 50% of the time, one of
his sentences is followed by another from himself. He responds to Doug
the 30% of the time that he can think of something to say faster than
Ann can.
Would this change your MLE estimate of the speaker sequence? Explain a bit
(perhaps give some examples).
Draw a transition diagram (no need to hand it in---save time by just doing
it on a piece of paper) and transition matrix for the model (hand this in).
Use the Viterbi algorithm to determine the MLE of the speaker sequence given
the observation sequence linked here.
EXTRA
7. PRML problem 13.5.