mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: [jira] Updated: (MAHOUT-15) Investigate Mean Shift Clustering
Date Fri, 14 Mar 2008 20:20:33 GMT

Clustering implementations are notoriously hard to debug, if only because
they are relatively robust so that broken implementations will often produce
plausible results.


On 3/14/08 12:58 PM, "Jeff Eastman (JIRA)" <jira@apache.org> wrote:

> 
>      [ 
> https://issues.apache.org/jira/browse/MAHOUT-15?page=com.atlassian.jira.plugin
> .system.issuetabpanels:all-tabpanel ]
> 
> Jeff Eastman updated MAHOUT-15:
> -------------------------------
> 
>     Attachment: MAHOUT-15c.patch
> 
> Found another defect in the iteration loop. The order of the done test (done =
> done && migrate(0.5)) was omitting the canopy migrations once the first one
> reported not done. I reversed the elements and now the algorithm converges in
> 4 iterations vs 44. I also tweaked the actual migration routine to merge with
> only the closest canopy vs. the first one encountered. Finally, I added
> another set of values ('/') to the initial image data set and the algorithm
> clustered it correctly too:
> 
> ABBBBBBBBC
> BABBBBBBCB
> BBABBBBCBB
> BBBABBCBBB
> BBBBACBBBB
> BBBBCABBBB
> BBBCBBABBB
> BBCBBBBABB
> BCBBBBBBAB
> CBBBBBBBBA
> 
> Note: The values I added had a z=4 value and were clustered separately (C).
> When I changed their z value to 9, there were only two remaining canopies (A,
> B):
> 
> ABBBBBBBBA
> BABBBBBBAB
> BBABBBBABB
> BBBABBABBB
> BBBBAABBBB
> BBBBAABBBB
> BBBABBABBB
> BBABBBBABB
> BABBBBBBAB
> ABBBBBBBBA
> 
> I still do not know what to call this algorithm, perhaps 'colliding canopies'
> or 'coalescing canopies'? Though it has some similarity to mean shift I'd be
> surprised if the term applies.
> 
>> Investigate Mean Shift Clustering
>> ---------------------------------
>> 
>>                 Key: MAHOUT-15
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-15
>>             Project: Mahout
>>          Issue Type: New Feature
>>          Components: Clustering
>>            Reporter: Jeff Eastman
>>            Assignee: Jeff Eastman
>>         Attachments: MAHOUT-15a.patch, MAHOUT-15b.patch, MAHOUT-15c.patch
>> 
>> 
>> "The mean shift algorithm is a nonparametric clustering technique which does
>> not require prior knowledge of the number of clusters, and does not constrain
>> the shape of the clusters."
>> http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf
>> Investigate implementing mean shift clustering using Hadoop


Mime
View raw message