Due: Tuesday, March 22 (but late assignments accepted due to previous typo).
This meta-assignment is being graded on the basis of making a reasonable attempt. Thus no marks are taken off for incorrect answers. However, for those that want to confirm their understanding, answers are below:
I created some confusion by being a bit sloppy in the wording of the assignment. The correct implementation of some of the possible interpretations is quite tricky. For that are interested, I have provided my derivations of the formulas used below here.
Matlab has lots of tricks to make the expression of computations compact. In the extreme case, some students computed the answer to the first problem using the functions "mean" and "std". However, the answers below are meant to clarify what the computation actually is.
Compute the mean and the sample standard deviation of each column of the matrix
In the original version this read "error" not "deviation" which, quite rightfully, caused some confusion.
% Simple method to compute the means of each column: [m,n] = size(x); for col = 1:n sum_x = 0; for row = 1:m sum_x = sum_x + x(row, col); end mean_x = sum_x / m endA slightly different way, perhaps exposing some understanding in the weighted case:
% Another way [m,n] = size(x); for col = 1:n sum_weight = 0; sum_x = 0; for row = 1:m % The symbol += means add the item on the right to the quantity % on the left. % weight = 1; sum_weight = sum_weight + weight; sum_x = sum_x + weight * x(row, col); end mean_x = sum_x / sum_weight end
The standard deviation is the square root of the variance. The variance is the expected value of the square of the deviation from the mean. When the probability (weights) are all the same (as in this case), this would be the average.
For the "sample" variance, divide by (m-1) instead of m (where m is the number of rows) to compute the averages. The reason for this correction of m/(m-1) as exposed by algebra (see the formulas) is a compensation for the expected error in the sample mean.
If you interpreted this part of the question to mean "standard error of the
% Simple method to compute the means of each column: [m,n] = size(x); for col = 1:n sum_x = 0; sum_dev_sqr = 0; % First get the mean as above. for row = 1:m sum_x = sum_x + x(row, col); end mean_x = sum_x / m % Now the variance and standard deviations for row = 1:m sum_dev_sqr = sum_dev_sqr + (x(row, col) - mean_x)^2; end var_x = sum_dev_sqr / m ; sample_var_x = sum_dev_sqr / (m - 1); stdev_x = sqrt(var_x) sample_stdev_x = sqrt(sample_var_x) error_of_mean = sample_stdev_x / sqrt(m) end
Allowing for my sloppy wording, you will likely find your answers somewhere in here:
mean 0.3772 0.4254 0.4567 stdev 0.2612 0.2483 0.3240 sample stdev 0.2680 0.2547 0.3324 error of mean 0.0599 0.0570 0.0743
We can compute the answer using very little memory and only one pass of the data by keeping a running tab of the number of points and the sufficient statistics: the sum, and the sum of squares. These can be put into a relatively simply formula for the mean (obvious) and the sample stdev.
Computing weighted mean and standard deviation
Again, sloppy wording in the original lead to some confusion. The following computes all the possible variants that I can think of.
% Simple method to compute the weighted means of each column: [m,n] = size(x); for col = 1:n sum_weight = 0; sum_x = 0; for row = 1:m weight = row; sum_weight = sum_weight + weight; sum_x = sum_x + weight * x(row, col); end mean_x = sum_x / sum_weight end
The variance is similar, and we take the sqrt() of the variance to get the answer. Again, getting the exact estimates was beyond the intended scope of the assignment, but I have implemented my answers to these as derived in the formulas for those interested.
[m,n] = size(x); for col = 1:n sum_weight = 0; sum_x = 0; sum_dev_sqr = 0; sum_sqr_weight = 0; % First get the mean as before for row = 1:m weight = row; sum_weight = sum_weight + weight; sum_x = sum_x + weight * x(row, col); sum_sqr_weight = sum_sqr_weight + weight*weight; end mean_x = sum_x / sum_weight % Now to the weighted sum of squared deviations for row = 1:m weight = row; sum_dev_sqr = sum_dev_sqr + weight * (x(row, col) - mean_x)^2; end var_x = sum_dev_sqr / sum_weight ; % The basic one (the one I meant). stdev_x = sqrt(var_x) incorrect_error_of_mean = stdev_x / sqrt(m) sample_var_x = sum_dev_sqr / (sum_weight - sum_sqr_weight/sum_weight); sample_stdev_x = sqrt(sample_var_x) error_of_mean = sample_stdev_x * sqrt(sum_sqr_weight / (sum_weight^2) ) end
The answers I get (my original intention was that one would go for the first 2 to keep things simple):
weighted mean 0.3295 0.4814 0.4492 weighted stdev 0.2556 0.2183 0.3410 divided by sqrt(20) 0.0571 0.0488 0.0763 weighted sample stdev 0.2643 0.2257 0.3527 weighted error of mean 0.0647 0.0576 0.0900Some divided the second row by sqrt(20) due to the appearance that I was after the estimated error of the mean. However, I will argue that this is not the correct formula in this case. That answer corresponds the case that all points are uniformly weighted, which is ideal, and thus it under-estimates the error in more general cases. Consider, for example, the case that the weights were such that the first point had all the weight. Then sqrt(m) would be far too big a quantity to divide the stdev by.
As in the non-weighted case, this can be done with very limited memory and only one pass of the data by keeping a few running totals.