mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Whitmore, Mattie" <>
Subject RE: Mahout-279/kmeans++
Date Fri, 17 Aug 2012 14:36:12 GMT
Hi Ted,

Yes this is great!  I hope to start working with this algorithm in the next couple weeks.

I have a question about the 0.7 implementation of kmeans and the clusterClassificationThreshold,
 I have this value set at zero, but the output is still showing that about 1/3 of my data
is not assigned to a cluster in my output.  Am I using this value incorrectly?  I did a
with the 0.5 and 0.7 api, and had the data pruned despite the clusterClassificationThreshold
= 0.



-----Original Message-----
From: Ted Dunning [] 
Sent: Wednesday, August 15, 2012 5:20 PM
Subject: Re: Mahout-279/kmeans++


Would this help?


On Wed, Aug 15, 2012 at 10:45 AM, Whitmore, Mattie <>wrote:

> Hi!
> I have been using RandomSeedGenerator, and was hoping it had a patch like
> that described in Mahout-279 since I want only 10 vectors out of a set of
> more than 100,000,000.  I have been using canopy clustering for better
> results, but still need to do a few passes of kmeans to determine my T, and
> the random seed does take a long time.
> The comments say that you are working on a kmeans++, I searched around but
> couldn't confirm any more information about it.  Is a scalable kmeans++ in
> the works? (I know research on the subject is quite new)
> Thanks!
> Mattie Whitmore
> Mathematician/IR&D Software Engineer
> HARRIS  Corporation - Advanced Information Solutions
> 301.837.5278
View raw message