mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: k-Means questions
Date Fri, 26 Jun 2009 02:11:19 GMT

On Jun 25, 2009, at 7:00 PM, Ted Dunning wrote:

> On Thu, Jun 25, 2009 at 3:49 PM, Grant Ingersoll  
> <gsingers@apache.org>wrote:
>
>> Do people have recommendations for start clusters (seeds) for k- 
>> Means.  The
>> synthetic control example uses Canopy and I often see Random  
>> selection
>> mentioned, but I'm wondering what's considered to be best practices  
>> for
>> obtaining good overall results.
>>
>
> Just picking a random data element for each centroid should work well.
> Random assignment works much less well because all of the centroids  
> get put
> very close to the mean of the entire data set.

I'm confused by these two sentences.  They seem contradictory, but I'm  
sure the error is on my end.

-Grant

Mime
View raw message