get_independent_GMM_with_shift - Finds a Gaussian mixture model (GMM) for data possibly containing discrete


#include "r/r_cluster.h"

Example compile flags (system dependent):
   -L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu
  -lKJB                               -lfftw3  -lgsl -lgslcblas -ljpeg  -lSVM -lstdc++                    -lpthread -lSLATEC -lg2c    -lacml -lacml_mv -lblas -lg2c      -lncursesw 

int get_independent_GMM_with_shift
	int min_shift,
	int max_shift,
	int num_clusters,
	const Matrix *feature_mp,
	const Vector *initial_delta_vp,
	const Vector *initial_a_vp,
	const Matrix *initial_u_mp,
	const Matrix *initial_var_mp,
	Vector **delta_vpp,
	Matrix **P_shift_mpp,
	Vector **a_vpp,
	Matrix **u_mpp,
	Matrix **var_mpp,
	Matrix **P_cluster_mpp


random global shifts in feature dimensions. This routine finds a Gaussian mixture model (GMM) for the data on the assumption that the features are independent. It allows for the possibility of a data point being shifted by a random discrete amount after having been generated from its Gaussian. The shifts are assumed to be independent of the Gaussians from which the data points are generated. Also the shifts are assumed to occur with wrap-arounds. The model is fit with EM. Some features are controlled via the set facility. In particular, it fits:
         p(x) = sum sum  a-sub-i * delta-sub-j * g(u-sub-i, v-sub-i, x(- s-sub-j))
                 i   j
where a-sub-i is the prior probability for the mixuture component (cluster), u-sub-i is the mean vector for component i, v-sub-i is the variance for the component, and g(u,v,x) is a Gaussian with diagonal covariance (i.e., the features are assumed to be independent, given the cluster). delta-sub-j is the prior probability of shift j and x(- s-sub-j) indicates a global reverse (negative sign) shift of x by the amount corresponding to s-sub-j. min_shift and max_shift specify the minimum and maximum amount of global discrete random shift a data point can experience after being generated from its Gaussian. Negative values for the shift indicate a shift in the left direction w.r.t. the assumed ordering of feature dimensions and positive values indicate a shift in right direction. Then the total number of possible shifts for any data point is S = (max_shift - min_shift + 1). The argument num_clusters is the number of requested mixture components (clusters), K. The data matrix feature_mp is an N by M matrix where N is the number of data points, and M is the number of features. The model parameters are put into *delta_vpp, *a_vpp, *u_mpp, and *var_mpp. Any of delta_vpp, a_vpp, u_mpp, or var_mpp is NULL if that value is not needed. The vector *delta_vpp contains the inferred probability distribution over shifts computed using all the training data points. It is of size S. The elements of *delta_vpp can be viewed as shift priors. Similarly the vector *a_vpp contains the inferred cluster priors. It is of size K. Both u-sub-i and v-sub-i are vectors, and they are put into the i'th row of *u_mpp and *var_mpp, respectively. The matrices are thus K by M. If P_cluster_mpp, is not NULL, then the soft clustering (cluster membership) for each data point is returned. In that case, *P_cluster_mpp will be N by K. If P_shift_mpp is not NULL, then the posterior probability distribution over the possible discrete shifts for each data point is returned. In that case, *P_shift_mpp will be N by S. Initial values of the parameters to be used as the starting values for the EM iterations can be specified using initial_delta_vp, initial_a_vp, initial_u_mp and initial_var_mp. If they are all NULL, then a random initialization scheme is used.


If the routine fails (due to storage allocation), then ERROR is returned with an error message being set. Otherwise NO_ERROR is returned.


This software is not adequatedly tested. It is recomended that results are checked independantly where appropriate.


Prasad Gabbur, Kobus Barnard.


Kobus Barnard


set_em_cluster_options , get_independent_GMM , get_independent_GMM_using_CEM , get_independent_GMM_with_shift_2 , get_GMM_blk_compound_sym_cov , get_GMM_blk_compound_sym_cov_1