Due was originally Tuesday, April 5. Extension granted until Thursday, April 7 with a few bonus marks being available to those who handed in working code before the due date. It is likely that the bonus points will only apply to the vision section. After Thursday, penalty is 2*N% per day, where N is the number of people in the group.
This was a tricky piece of code to write, and kudos to everyone for their efforts. Setting up strategies for finding errors in this kind of code is difficult, and challenged many students who are quite used to writing code.
Some had trouble making the code run fast enough, and worked on a subset. I think that most in this situation assumed that the problem was with their code. I did not get any E-mail asking what the expected performance should be, or any requests to help with this particular problem.
Running on a subset does have problems. You have less data to learn from. Further, if you simply took a consecutive portion of the data, then you may not had any examples of some images, as the images are not random. First there are mountains, then there are planes, then there are flowers, then there are cats, etc. So training on a consecutive subset may well have left out a number of types of images that were in the test set.
One problem with Matlab is that it encourages inefficient code. Of course, the fact that it is interpreted is a problem as well. It is possible to write reasonably fast Matlab code (and there is a compiler for it, for those that really like to swim upstream), but doing so requires understanding a number of things that Matlab hides from you. I would have been happy to discuss the efficiency of the programs, but no one requested this.
My simplistic Matlab implementation was not coded for speed, and takes about a minute and a half per iteration running on a 2.4 GHz processor for 100 concepts on the entire training data. The process uses about 90 megabytes of RAM. My reasonably efficient C implementation takes less than 5 seconds per iteration, but this may be the result of some fancy tricks that are in that code. One group who did it in C reported a bit over a minute per iteration for 200 concepts. 100 concepts would be about half that.
Links to three Matlab files which implement a solution to the homework:
normalize_rows.m
train.m
use.m
The log-likelihood for 30 iterations (and deltas) for the above code follows. If you handled certain scale factors differently, or chose a different data set, then you would have got something different. Several groups implemented exactly the same algorithm, and their LL was very similar, with differences being due to initialization.
-5.462e+05 -1.659e+05 3.804e+05 1.529e+05 3.188e+05 2.769e+05 1.240e+05 3.221e+05 4.523e+04 3.471e+05 2.499e+04 3.653e+05 1.816e+04 3.787e+05 1.344e+04 3.886e+05 9.934e+03 3.960e+05 7.406e+03 4.017e+05 5.615e+03 4.061e+05 4.481e+03 4.099e+05 3.796e+03 4.134e+05 3.484e+03 4.164e+05 3.008e+03 4.192e+05 2.792e+03 4.220e+05 2.783e+03 4.244e+05 2.429e+03 4.266e+05 2.143e+03 4.285e+05 1.933e+03 4.302e+05 1.694e+03 4.317e+05 1.520e+03 4.332e+05 1.455e+03 4.346e+05 1.395e+03 4.359e+05 1.343e+03 4.371e+05 1.144e+03 4.380e+05 9.577e+02 4.389e+05 9.163e+02 4.398e+05 8.288e+02 4.405e+05 7.176e+02