mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <>
Subject Re: Clustering with KMeans
Date Wed, 09 Feb 2011 03:36:48 GMT
Yeah, it was not the only cluster that was formed, there were around 200
cluster. I played around with t1 and t2 and now I have 30 clusters which I
am using to cluster the new data points, doing it with CanopyDriver.

I get the following exceptions when the CanopyDriver.clusterData tries to
find the closest Canopy.

org.apache.mahout.math.CardinalityException: Required cardinality 23 but got



at org.apache.mahout.clustering.canopy.CanopyClusterer.findClosestCanopy(





at org.apache.hadoop.mapred.MapTask.runNewMapper(


at org.apache.hadoop.mapred.LocalJobRunner$

code which is trying to find the closest canopy:

CanopyDriver.clusterData(conf, new Path ("test-vectors", "tfidf-vectors"),

new Path (canopyCentroidsOutputPath, "clusters-0"),
canopyCentroidsOutputPath, measure, t1, t2, false);

* test-vectors/tfidf-vectors - path to the new test data, created using the
previously mentioned customized data convertor and Seq2Sparse.

* canopyCentroidsOutputPath, "clusters-0" - Path to the canopy centroids
that were formed during the training phase.

* measure - SquaredEuclideanDistanceMeasure, used the same even in the
training phase.

* t1 - 2000 t2 - 1900

* Sequential false/true - either case it throws the cardinalityException in
the method.

dot method's first line of code is the cardinality comparison which throws
the exception.  I wanted to use canopyClustering as a quick "online"
clustering of the new data points(though not accurate compared to KMeans). Am
I not supposed to use canopy that way?

Thanks everybody, especially Kate. Your response to the previous emails are
much appreciated.

Thanks and Regards,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message