On Jul 27, 2009, at 12:55 PM, Shashikant Kore wrote:
> On Mon, Jul 27, 2009 at 10:11 PM, Grant
> Ingersoll<gsingers@apache.org> wrote:
>>
>> Not following. The distance calc stuff is irrespective of the type
>> of
>> Vector. I was referring to the centroid length square (I think you
>> called
>> it the triangle inequality) stuff that Shashikant added on
>> MAHOUT-121. We
>> use it for testing convergence, but not for other distance
>> calculations. I
>> haven't looked to see if it is applicable yet, but it seems like it
>> should
>> be.
>>
>
> Grant,
>
> Yes, that part of the patch is missing. In my original patch, I had
> modified the emitPointToNearestCluster() in kmeans/Cluster.java to
> calculate distance between document and centroids of various clusters.
> (There is no triangle inequality code, though.) In the later patches
> I don't see that code.
>
> I had reviewed the final patch, but I missed out on this one. I
> think, I only ran Canopy and not K-means. Incidentally, I am
> hopelessly out of date with trunk as recently I have not worked on
> this. BTW, I haven't really followed this thread in depth. So, I
> might be speaking out of context here. Apologies.
I'll be on a plane tomorrow, will see if I can track down the
differences.
-Grant
|