mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <>
Subject [jira] [Resolved] (MAHOUT-626) T1 and T2 Values in Canopy (& MeanShift)
Date Wed, 17 Aug 2011 20:49:27 GMT


Jeff Eastman resolved MAHOUT-626.

    Resolution: Fixed

I've stewed about whether or not to try this with MeanShiftCanopy and decided it is not appropriate
to change these values between the mapper and reducer. MeanShift is an iterative algorithm
and these changes would vascillate between mapper & reducer values in a way that is not
reflected in the algorithm as I understand it.

> T1 and T2 Values in Canopy (& MeanShift) 
> -----------------------------------------
>                 Key: MAHOUT-626
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.6
>         Attachments: CanopyT3T4.patch
> Users are reporting that the T1 and T2 threshold values which work in sequential mode
don't work as well in the mapreduce mode because both the mapper and reducer are using the
same values. The effect of coalescing a number of points into a single centroid done by the
mapper changes the distances enough that independent threshold values are needed in the reducer.

> Here is a patch which implements optional T3 and T4 threshold values which are only used
by the canopy reducer. Convenience methods have been added for API compatibility and defaults
included so that these values will default to T1 and T2. A new unit test confirms the thresholds
are being set correctly.
> If this works out as a positive improvement, I will make the same changes to MeanShift
and commit them

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message