get_GMM_blk_compound_sym_cov_1 - This routine is the same as the get_GMM_blk_compound_sym_cov with the added


#include "r/r_cluster.h"

Example compile flags (system dependent):
   -L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu
  -lKJB                               -lfftw3  -lgsl -lgslcblas -ljpeg  -lSVM -lstdc++                    -lpthread -lSLATEC -lg2c    -lacml -lacml_mv -lblas -lg2c      -lncursesw 

int get_GMM_blk_compound_sym_cov_1
	const Int_vector_vector *block_diag_sizes_vvp,
	const Matrix *feature_mp,
	const Int_vector *held_out_indicator_vp,
	const Vector *initial_a_vp,
	const double initial_mu,
	const double initial_sig_sqr,
	const double initial_tau_sig_sqr_ratio,
	Vector **a_vpp,
	double *mu_ptr,
	double *sig_sqr_ptr,
	double *tau_sig_sqr_ratio_ptr,
	Matrix **P_mpp,
	double *log_likelihood_ptr,
	double *held_out_log_likelihood_ptr,
	int *num_iterations_ptr


feature of estimating the tau_sig_sqr_ratio parameter automatically through gradient ascent of the expected complete log likelihood in the M step. Finds a Gaussian mixture model (GMM) where the Gaussians have block compound symmetrical covariances with shared parameters. Specifically, each feature has the same mean (mu) and variance (sig^2 + tau^2). The covariance between any pair of features is either (tau^2) or 0. This holds for all the Gaussians in the mixture. However the block diagonal structure of the covariance matrices of the different Gaussians in the mixture is different. Here it is not assumed that the ratio (tau^2 / sig^2) is known beforehand unlike the parent routine get_GMM_blk_compound_sym_cov. In particular, it fits:
         p(x) = sum  a-sub-i *  g(mu-vec, cov-sub-i, x)
where a-sub-i is the prior probability for the mixuture compoenent (cluster), mu-vec is the mean vector with all components equal to mu, cov-sub-i is the covariance matrix for the component i, and g(mu,cov,x) is a Gaussian with mean mu and covariance cov. The data matrix feature_mp is an N by M matrix where N is the number of data points, and M is the number of features. The argument block_diag_sizes_vvp specifies the block diagonal structures of the covariances of the Gaussian components. The number of vectors in this argument is equal to the number of mixture compoenents (clusters), K. Each vector is a list of sizes of the block diagonals from top to bottom in the corresponding covariance matrix. For eg., the vector corresponding to a Gaussian component with all independent features would consist of 1 as all its elements and the number of elements in the vector equal to M. And a Gaussian component with two block diagonals of the same size in its covariance matrix would be specified by a vector with two elements, each equal to M/2. initial_a_vp, initial_mu, initial_sig_sqr and initial_tau_sig_sqr_ratio can be used to specify the initial values of the parameters for EM. The model parameters are put into *a_vpp, *mu_ptr, *sig_sqr_ptr and *tau_sig_sqr_ratio_ptr. Any of a_vpp, mu_ptr, sig_sqr_ptr or tau_sig_sqr_ratio_ptr is NULL if that value is not needed. If P_mpp is not NULL, then the soft clustering (cluster membership) for each data point is returned. In that case, *P_mpp will be N by K.


If the routine fails (due to storage allocation), then ERROR is returned with an error message being set. Otherwise NO_ERROR is returned.


The covariance structure assumed in this routine is often referred to as "block compound-symmetry" structure especially in the mixed-models ANOVA literature. It is useful in modeling data with repeated measures in ANOVA using mixed-models. For example see:


This software is not adequatedly tested. It is recomended that results are checked independantly where appropriate.


Prasad Gabbur, Kobus Barnard.


Kobus Barnard


set_em_cluster_options , get_independent_GMM , get_independent_GMM_using_CEM , get_independent_GMM_with_shift , get_independent_GMM_with_shift_2 , get_GMM_blk_compound_sym_cov