NAME
get_kmeans_clusters - Clusters data using the k-means method.
SYNOPSIS
#include "lsm/lsm_cluster.h"
Example compile flags (system dependent):
-DLINUX_X86_64 -DLINUX_X86_64_OPTERON -DGNU_COMPILER
-I/home/kobus/include
-L/home/kobus/misc/load/linux_x86_64_opteron -L/usr/lib/x86_64-linux-gnu
-lKJB -lfftw3 -lgsl -lgslcblas -ljpeg -lSVM -lstdc++ -lpthread -lSLATEC -lg2c -lacml -lacml_mv -lblas -lg2c -lncursesw
int get_kmeans_clusters
(
const Matrix *data_mp,
int num_clusters,
int (*distance_fn)(Vector *,Vector *,double *),
Matrix **output_cluster_mpp,
Vector **output_weights_vpp,
Vector **output_classes_vpp
);
DESCRIPTION
Clusters data into a set number of cluster centres using the k-means
algorithm. Returns the cluster centres in a matrix, the number of data
points per cluster centre (normalized over [0, 1.0] range, and the cluster
centre each input data point is assigned to.
The input data to be clustered is contained in "data_mp", an N x D matrix,
where N is the number of rows and D is the number of columns (dimensions).
The argument "num_clusters" indicates how many cluster centres to compute from
the input data. This is the "k" in the k-means algorithm.
The "distance_fn" argument allows the user to specify their own distance
metric that operates on vectors. If the "distance_fn" argument is NULL,
then the Euclidean distance will be used. To specify your own distance
function, you must use the following prototype:
| int my_distance(Vector* v1_vp, Vector* v2_vp, double* distance_ptr)
Your distance function should return NO_ERROR on success, and ERROR on
failure.
The computed cluster centres are returned in "output_cluster_mpp", which
which is a double pointer to a Matrix of size "num_clusters" x D. If the
"output_cluster_mpp" matrix does not exist (*output_cluster_mpp == NULL)
or is the wrong size, the matrix will be created or resized as appropriate.
The normalized number of data points assigned to each cluster is
returned in "output_weights_vpp", a Vector of length "num_clusters".
If this argument is NULL it will be ignored. If the vector does not exist
(*output_weights_vpp == NULL), or is the wrong length, the vector will
be created/resized. Note that this vector has benn normalized to sum to 1.0.
The index of the cluster centre that each input data point has been assigned
to is returned in "output_classes_vpp", a Vector of length N. Each entry
is the index to "output_cluster_mpp" Matrix to which the data point has
been assigned. If this value is NULL, it will be ignored. If the vector does
not exist or is the wrong size, it will be created/resized.
Finding the clusters is an iterative process. The iterations are controlled
by two parameters: the maximum number of iterations allowed, and the
difference a computed cluster centre and its value at the previous iteration.
See the man pages for "set_kmeans_max_iterations" and "set_kmeans_epsilon"
for more info.
RETURNS
Either NO_ERROR, or ERROR, with an appropriate error message being set.
RELATED
set_kmeans_options, set_kmeans_max_iterations, get_kmeans_max_iterations,
set_kmeans_epsilon, get_kmeans_epsilon, free_kmeans_allocated_static_data.
DISCLAIMER
This software is not adequatedly tested. It is recomended that
results are checked independantly where appropriate.
AUTHOR
Lindsay Martin
DOCUMENTER
Lindsay Martin
SEE ALSO
set_clustering_options
,
set_kmeans_max_iterations
,
get_kmeans_max_iterations
,
set_kmeans_epsilon
,
get_kmeans_epsilon
,
set_3D_histogram_num_bins
,
get_3D_histogram_num_bins
,
get_3D_histogram_clusters