NAME
get_GMM_blk_compound_sym_cov - Finds a Gaussian mixture model (GMM) where the Gaussians have block compound
SYNOPSIS
#include "r/r_cluster.h"
Example compile flags (system dependent):
-DLINUX_X86_64 -DLINUX_X86_64_OPTERON -DGNU_COMPILER
-I/home/kobus/include
-L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu
-lKJB -lfftw3 -lgsl -lgslcblas -ljpeg -lSVM -lstdc++ -lpthread -lSLATEC -lg2c -lacml -lacml_mv -lblas -lg2c -lncursesw
int get_GMM_blk_compound_sym_cov
(
const Int_vector_vector *block_diag_sizes_vvp,
const Matrix *feature_mp,
const Int_vector *held_out_indicator_vp,
const Vector *initial_a_vp,
const double initial_mu,
const double initial_sig_sqr,
const double tau_sig_sqr_ratio,
Vector **a_vpp,
double *mu_ptr,
double *sig_sqr_ptr,
Matrix **P_mpp,
double *log_likelihood_ptr,
double *held_out_log_likelihood_ptr,
int *num_iterations_ptr
);
DESCRIPTION
symmetrical covariances with shared parameters. Specifically, each feature
has the same mean (mu) and variance (sig^2 + tau^2). The covariance
between any pair of features is either (tau^2) or 0. This holds for all the
Gaussians in the mixture. However the block diagonal structure of the
covariance matrices of the different Gaussians in the mixture is different.
Further it is assumed that the ratio (tau^2 / sig^2) is known beforehand.
This amounts to reducing one degree of freedom in fitting the model but it
enables closed form solutions for optimum parameter values (mu, sig^2) in
the M-step of EM. A suitable value for the ratio can be estimated using
cross-validation.
In particular, it fits:
p(x) = sum a-sub-i * g(mu-vec, cov-sub-i, x)
i
where a-sub-i is the prior probability for the mixuture compoenent (cluster),
mu-vec is the mean vector with all components equal to mu, cov-sub-i is the
covariance matrix for the component i, and g(mu,cov,x) is a Gaussian with
mean mu and covariance cov.
The data matrix feature_mp is an N by M matrix where N is the number of data
points, and M is the number of features.
The argument block_diag_sizes_vvp specifies the block diagonal structures of
the covariances of the Gaussian components. The number of vectors in this
argument is equal to the number of mixture compoenents (clusters), K. Each
vector is a list of sizes of the block diagonals from top to bottom in the
corresponding covariance matrix. For eg., the vector corresponding to a
Gaussian component with all independent features would consist of 1 as all
its elements and the number of elements in the vector equal to M. And a
Gaussian component with two block diagonals of the same size in its
covariance matrix would be specified by a vector with two elements, each
equal to M/2.
The argument tau_sig_sqr specifies the ratio as mentioned above.
initial_a_vp, initial_mu and initial_sig_sqr can be used to specify the
initial values of the parameters for EM.
The model parameters are put into *a_vpp, *mu_ptr, and *sig_sqr_ptr. Any
of a_vpp, mu_ptr, or sig_sqr_ptr is NULL if that value is not needed.
If P_mpp is not NULL, then the soft clustering (cluster membership) for each
data point is returned. In that case, *P_mpp will be N by K.
RETURNS
If the routine fails (due to storage allocation), then ERROR is returned
with an error message being set. Otherwise NO_ERROR is returned.
NOTE
The covariance structure assumed in this routine is often referred to
as "block compound-symmetry" structure especially in the mixed-models ANOVA
literature. It is useful in modeling data with repeated measures in ANOVA
using mixed-models. For example see:
http://www.asu.edu/sas/sasdoc/sashtml/stat/chap41/sect23.htm
http://www.tufts.edu/~gdallal/repeat2.htm
DISCLAIMER
This software is not adequatedly tested. It is recomended that
results are checked independantly where appropriate.
AUTHOR
Prasad Gabbur, Kobus Barnard.
DOCUMENTER
Kobus Barnard
SEE ALSO
set_em_cluster_options
,
get_independent_GMM
,
get_independent_GMM_using_CEM
,
get_independent_GMM_with_shift
,
get_independent_GMM_with_shift_2
,
get_GMM_blk_compound_sym_cov_1