#include "r/r_cluster.h" Example compile flags (system dependent): -DLINUX_X86_64 -DLINUX_X86_64_OPTERON -DGNU_COMPILER -I/home/kobus/include -L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu -lKJB -lfftw3 -lgsl -lgslcblas -ljpeg -lSVM -lstdc++ -lpthread -lSLATEC -lg2c -lacml -lacml_mv -lblas -lg2c -lncursesw int get_independent_GMM_using_CEM ( int initial_num_clusters, const Matrix *feature_mp, const Vector *initial_a_vp, const Matrix *initial_u_mp, const Matrix *initial_var_mp, Vector **a_vpp, Matrix **u_mpp, Matrix **var_mpp, Matrix **P_mpp, double beta, double gamma );

p(x) = sum a-sub-i * g(u-sub-i, v-sub-i, x) iwhere a-sub-i is the prior probability for the mixuture component (cluster), u-sub-i is the mean vector for component i, v-sub-i is the variance for the component, and g(u,v,x) is a Gaussian with diagonal covariance (i.e., the features are assumed to be independent, given the cluster). The argument initial_num_clusters is the initial number of mixture components (clusters). The data matrix feature_mp is an N by M matrix where N is the number of data points, and M is the number of features. The means, variances and priors of the initial clusters can be specified through the arguments initial_u_mp, initial_var_mp, and initial_a_vp respectively. If they are all NULL, then a random initialization method is used for the clusters. The model parameters are put into *a_vpp, *u_mpp, and *v_mpp. Any of a_vpp, u_mpp, or v_mpp is NULL if that value is not needed. Both u-sub-i and v-sub-i are vectors, and they are put into the i'th row of *u_mpp and *v_mpp, respectively. The matrices are thus K by M, where K is the final number of clusters. If P_mpp, is not NULL, then the soft clustering (cluster membership) for each data point is returned. In that case, *P_mpp will be N by K. The parameters gamma and beta control the cluster split-merge operations. These need to be determined experimentally. However, the following observations might help as a rough guide in the selection of values for these: The higher the value of gamma, the greater is the frequency of split-merge operations and hence greater is the ability to jump in the parameter space (see reference). A value that is of the order of 0.1L or L, where L is the adjusted log-likelihood, has been observed to give reasonable results. The order of L can be determined from the logs printed out by the routine. The higher the value of beta, the greater is the chance of merge operations being chosen relative to split operations (see reference) while proceeding through the split-merge sequence. A value in the range of 0.1R - 10R, where, R = min(KL_divergence_merge^2, KL_divergence_split^2), has been observed to give reasonable results. R has to be determined experimentally by observing the KL divergence values printed out by the routine.