get_GMM_blk_compound_sym_cov - Finds a Gaussian mixture model (GMM) where the Gaussians have block compound


#include "r/r_cluster.h"

Example compile flags (system dependent):
   -L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu
  -lKJB                               -lfftw3  -lgsl -lgslcblas -ljpeg  -lSVM -lstdc++                    -lpthread -lSLATEC -lg2c    -lacml -lacml_mv -lblas -lg2c      -lncursesw 

int get_GMM_blk_compound_sym_cov
	const Int_vector_vector *block_diag_sizes_vvp,
	const Matrix *feature_mp,
	const Int_vector *held_out_indicator_vp,
	const Vector *initial_a_vp,
	const double initial_mu,
	const double initial_sig_sqr,
	const double tau_sig_sqr_ratio,
	Vector **a_vpp,
	double *mu_ptr,
	double *sig_sqr_ptr,
	Matrix **P_mpp,
	double *log_likelihood_ptr,
	double *held_out_log_likelihood_ptr,
	int *num_iterations_ptr


symmetrical covariances with shared parameters. Specifically, each feature has the same mean (mu) and variance (sig^2 + tau^2). The covariance between any pair of features is either (tau^2) or 0. This holds for all the Gaussians in the mixture. However the block diagonal structure of the covariance matrices of the different Gaussians in the mixture is different. Further it is assumed that the ratio (tau^2 / sig^2) is known beforehand. This amounts to reducing one degree of freedom in fitting the model but it enables closed form solutions for optimum parameter values (mu, sig^2) in the M-step of EM. A suitable value for the ratio can be estimated using cross-validation. In particular, it fits:
         p(x) = sum  a-sub-i *  g(mu-vec, cov-sub-i, x)
where a-sub-i is the prior probability for the mixuture compoenent (cluster), mu-vec is the mean vector with all components equal to mu, cov-sub-i is the covariance matrix for the component i, and g(mu,cov,x) is a Gaussian with mean mu and covariance cov. The data matrix feature_mp is an N by M matrix where N is the number of data points, and M is the number of features. The argument block_diag_sizes_vvp specifies the block diagonal structures of the covariances of the Gaussian components. The number of vectors in this argument is equal to the number of mixture compoenents (clusters), K. Each vector is a list of sizes of the block diagonals from top to bottom in the corresponding covariance matrix. For eg., the vector corresponding to a Gaussian component with all independent features would consist of 1 as all its elements and the number of elements in the vector equal to M. And a Gaussian component with two block diagonals of the same size in its covariance matrix would be specified by a vector with two elements, each equal to M/2. The argument tau_sig_sqr specifies the ratio as mentioned above. initial_a_vp, initial_mu and initial_sig_sqr can be used to specify the initial values of the parameters for EM. The model parameters are put into *a_vpp, *mu_ptr, and *sig_sqr_ptr. Any of a_vpp, mu_ptr, or sig_sqr_ptr is NULL if that value is not needed. If P_mpp is not NULL, then the soft clustering (cluster membership) for each data point is returned. In that case, *P_mpp will be N by K.


If the routine fails (due to storage allocation), then ERROR is returned with an error message being set. Otherwise NO_ERROR is returned.


The covariance structure assumed in this routine is often referred to as "block compound-symmetry" structure especially in the mixed-models ANOVA literature. It is useful in modeling data with repeated measures in ANOVA using mixed-models. For example see:


This software is not adequatedly tested. It is recomended that results are checked independantly where appropriate.


Prasad Gabbur, Kobus Barnard.


Kobus Barnard


set_em_cluster_options , get_independent_GMM , get_independent_GMM_using_CEM , get_independent_GMM_with_shift , get_independent_GMM_with_shift_2 , get_GMM_blk_compound_sym_cov_1