mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: kMeans Help
Date Mon, 29 Jun 2009 13:19:30 GMT

On Jun 29, 2009, at 9:04 AM, nfantone wrote:

> I really see no harm (algorithmically and conceptually) in returning
> the center as the centroid if there's only one point added to the
> cluster. If that's what you need to solve your predicament, I say go
> for it. Are there any drawbacks?

Yeah, I don't see any, but need to run the tests.  Obviously, someone  
thought otherwise, so it would be good to know the reasoning.

>
> What eludes me is the actual way of adding points. How can I compute
> its total set at any given moment? Say, I create a Cluster with a
> center, then add some points - the addPoint() just stores a pointTotal
> Vector with the total vector sum- and want to check which vectors I
> have added so far with their original values. Is this even possible?

AFAICT, no, but I'm sure that is by design, otherwise you would be  
carrying around a lot of vectors and I doubt it would scale.  I think  
the final Clustering step takes care of associating the points to the  
centroids.

-Grant

Mime
View raw message