mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eshwaran Vijaya Kumar <>
Subject Re: Mahout KMeans Output
Date Fri, 12 Aug 2011 19:35:38 GMT
Excellent..NamedVectors would do the job. Thanks.
On Aug 12, 2011, at 12:09 PM, Jeff Eastman wrote:

> KMeans does not use the key in its mapper, only the VectorWritable value. But you can
create NamedVectors in your upstream processing and put the IDs in the name and the Vectors
in the delegate. The NVs will flow through the clustering step into the clusteredPoints directory.
You will have to write your own clustering step if you want a different output than the WVWs.
> -----Original Message-----
> From: Eshwaran Vijaya Kumar [] 
> Sent: Friday, August 12, 2011 11:44 AM
> To:
> Subject: Mahout KMeans Output 
> I am using KMeans as part of a long pipeline. Suppose I give Kmeans a SequenceFile containing
Key as IntWritable and value as VectorWritable where the Keys are IDs for the Vectors, is
there a utility or an option to get KMeans to spit out the IDs that belong to a cluster rather
than the WeightedVectorWritable bean? 
> Thanks
> Esh

View raw message