mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: k-Means questions
Date Fri, 26 Jun 2009 02:17:31 GMT

Option 1) pick a single (or three) random input vector as the initial
position of each centroid

Option 2) assign every input vector to some centroid at random and compute
the resulting centroids

Option (1) is like (2), but it only assigns k input vectors while option (2)
assigns all input vectors to some cluster.  Many people use (2), but (1)
generally works better for me.

On Thu, Jun 25, 2009 at 7:11 PM, Grant Ingersoll <>wrote:

> Just picking a random data element for each centroid should work well.
>> Random assignment works much less well because all of the centroids get
>> put
>> very close to the mean of the entire data set.
> I'm confused by these two sentences.  They seem contradictory, but I'm sure
> the error is on my end.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message