Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@locus.apache.org Received: (qmail 87252 invoked from network); 14 Mar 2008 20:21:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Mar 2008 20:21:16 -0000 Received: (qmail 80366 invoked by uid 500); 14 Mar 2008 20:21:13 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 80338 invoked by uid 500); 14 Mar 2008 20:21:13 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 80329 invoked by uid 99); 14 Mar 2008 20:21:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2008 13:21:13 -0700 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2008 20:20:22 +0000 Received: from 206.169.1.36 ([206.169.1.36]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Fri, 14 Mar 2008 20:20:42 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Fri, 14 Mar 2008 13:20:33 -0700 Subject: Re: [jira] Updated: (MAHOUT-15) Investigate Mean Shift Clustering From: Ted Dunning To: Message-ID: Thread-Topic: [jira] Updated: (MAHOUT-15) Investigate Mean Shift Clustering Thread-Index: AciGENrjGT4P7vIEEdyk3QAWy8rVfQ== In-Reply-To: <186671266.1205524704460.JavaMail.jira@brutus> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Clustering implementations are notoriously hard to debug, if only because they are relatively robust so that broken implementations will often produce plausible results. On 3/14/08 12:58 PM, "Jeff Eastman (JIRA)" wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-15?page=com.atlassian.jira.plugin > .system.issuetabpanels:all-tabpanel ] > > Jeff Eastman updated MAHOUT-15: > ------------------------------- > > Attachment: MAHOUT-15c.patch > > Found another defect in the iteration loop. The order of the done test (done = > done && migrate(0.5)) was omitting the canopy migrations once the first one > reported not done. I reversed the elements and now the algorithm converges in > 4 iterations vs 44. I also tweaked the actual migration routine to merge with > only the closest canopy vs. the first one encountered. Finally, I added > another set of values ('/') to the initial image data set and the algorithm > clustered it correctly too: > > ABBBBBBBBC > BABBBBBBCB > BBABBBBCBB > BBBABBCBBB > BBBBACBBBB > BBBBCABBBB > BBBCBBABBB > BBCBBBBABB > BCBBBBBBAB > CBBBBBBBBA > > Note: The values I added had a z=4 value and were clustered separately (C). > When I changed their z value to 9, there were only two remaining canopies (A, > B): > > ABBBBBBBBA > BABBBBBBAB > BBABBBBABB > BBBABBABBB > BBBBAABBBB > BBBBAABBBB > BBBABBABBB > BBABBBBABB > BABBBBBBAB > ABBBBBBBBA > > I still do not know what to call this algorithm, perhaps 'colliding canopies' > or 'coalescing canopies'? Though it has some similarity to mean shift I'd be > surprised if the term applies. > >> Investigate Mean Shift Clustering >> --------------------------------- >> >> Key: MAHOUT-15 >> URL: https://issues.apache.org/jira/browse/MAHOUT-15 >> Project: Mahout >> Issue Type: New Feature >> Components: Clustering >> Reporter: Jeff Eastman >> Assignee: Jeff Eastman >> Attachments: MAHOUT-15a.patch, MAHOUT-15b.patch, MAHOUT-15c.patch >> >> >> "The mean shift algorithm is a nonparametric clustering technique which does >> not require prior knowledge of the number of clusters, and does not constrain >> the shape of the clusters." >> http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf >> Investigate implementing mean shift clustering using Hadoop