#include "r/r_cluster.h" Example compile flags (system dependent): -DLINUX_X86_64 -DLINUX_X86_64_OPTERON -DGNU_COMPILER -I/home/kobus/include -L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu -lKJB -lfftw3 -lgsl -lgslcblas -ljpeg -lSVM -lstdc++ -lpthread -lSLATEC -lg2c -lacml -lacml_mv -lblas -lg2c -lncursesw int get_independent_GMM_with_shift_2 ( int max_left_shift, int max_right_shift, int num_clusters, const Matrix *feature_mp, const Vector *initial_delta_vp, const Vector *initial_a_vp, const Matrix *initial_u_mp, const Matrix *initial_var_mp, Vector **delta_vpp, Matrix **P_shift_mpp, Vector **a_vpp, Matrix **u_mpp, Matrix **var_mpp, Matrix **P_cluster_mpp );

p(x) = sum sum a-sub-i * delta-sub-j * g(u-sub-i, v-sub-i, x(- s-sub-j)) i jwhere a-sub-i is the prior probability for the mixuture component (cluster), u-sub-i is the mean vector for component i, v-sub-i is the variance for the component, and g(u,v,x) is a Gaussian with diagonal covariance (i.e., the features are assumed to be independent, given the cluster). delta-sub-j is the prior probability of shift j and x(- s-sub-j) indicates a global reverse (negative sign) shift of x by the amount corresponding to s-sub-j. max_left_shift and max_right_shift specify the maximum amount of global discrete random left and right shift respectively a data point can experience after being generated from its Gaussian. Unlike the counterpart routine, each of these parameters can have only non-negative values. The total number of possible shifts for any data point is S = (max_left_shift + max_right_shift + 1) including the zero shift. Based on max_left_shift and max_right_shift, a subspace of the entire feature space exists that is guaranteed to be unaffected by the arbitrary noise that a random shift introduces. It is of dimension T = M - (max_left_shift + max_right_shift), where M is the dimensionality of the full feature space. So, the EM procedure determines clusters in this subspace rather than the full space. The argument num_clusters is the number of requested mixture components (clusters), K. The data matrix feature_mp is an N by M matrix where N is the number of data points, and M is the number of features. The model parameters are put into *delta_vpp, *a_vpp, *u_mpp, and *var_mpp. Any of delta_vpp, a_vpp, u_mpp, or var_mpp is NULL if that value is not needed. The vector *delta_vpp contains the inferred probability distribution over shifts computed using all the training data points. It is of size S. The elements of *delta_vpp can be viewed as shift priors. The assumed order of shifts in this vector or any other output pertaining to shifts is: (max_left_shift, max_left_shift-1,...., 0,...., max_right_shift-1, max_right_shift) The vector *a_vpp contains the inferred cluster priors. It is of size K. Both u-sub-i and v-sub-i are vectors, and they are put into the i'th row of *u_mpp and *var_mpp, respectively. The matrices are thus K by T. If P_cluster_mpp, is not NULL, then the soft clustering (cluster membership) for each data point is returned. In that case, *P_cluster_mpp will be N by K. If P_shift_mpp is not NULL, then the posterior probability distribution over the possible discrete shifts for each data point is returned. In that case, *P_shift_mpp will be N by S. Initial values of the parameters to be used as the starting values for the EM iterations can be specified using initial_delta_vp, initial_a_vp, initial_u_mp and initial_var_mp. If they are all NULL, then a random initialization scheme is used. It is assumed that the initial parameters are specified either in the full feature space or the reduced space in which the final clusters are sought. In case of full space, the routine retrieves the parameters corresponding to the target subspace.