mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashikant Kore <>
Subject Re: Clustering from DB
Date Mon, 27 Jul 2009 16:55:47 GMT
On Mon, Jul 27, 2009 at 10:11 PM, Grant Ingersoll<> wrote:
> Not following.  The distance calc stuff is irrespective of the type of
> Vector.  I was referring to the centroid length square (I think you called
> it the triangle inequality) stuff that Shashikant added on MAHOUT-121.  We
> use it for testing convergence, but not for other distance calculations.  I
> haven't looked to see if it is applicable yet, but it seems like it should
> be.


Yes, that part of the patch is missing.  In my original patch, I had
modified the  emitPointToNearestCluster() in kmeans/ to
calculate distance between document and centroids of various clusters.
 (There is no triangle inequality code, though.)  In the later patches
I don't see that code.

I had reviewed the final patch, but I missed out on this one.  I
think, I only ran Canopy and not K-means. Incidentally, I am
hopelessly out of date with trunk as recently I have not worked on
this.  BTW, I haven't really followed this thread in depth. So, I
might be speaking out of context here. Apologies.


View raw message