Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3312B7343 for ; Fri, 11 Nov 2011 04:11:15 +0000 (UTC) Received: (qmail 71057 invoked by uid 500); 11 Nov 2011 04:11:13 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 70895 invoked by uid 500); 11 Nov 2011 04:11:13 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 70871 invoked by uid 99); 11 Nov 2011 04:11:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2011 04:11:13 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2011 04:11:11 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C3CB24DB0E for ; Fri, 11 Nov 2011 04:10:51 +0000 (UTC) Date: Fri, 11 Nov 2011 04:10:51 +0000 (UTC) From: "Paritosh Ranjan (Updated) (JIRA)" To: dev@mahout.apache.org Message-ID: <1849609255.20044.1320984651803.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <21147921.17567.1318706831805.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (MAHOUT-843) Top Down Clustering MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paritosh Ranjan updated MAHOUT-843: ----------------------------------- Attachment: MAHOUT-843-patch-only-postprocessor-v1 I have taken all incoming changes and created the patch. Also added TopDownClusteringPathConstants. Can't see any other external reference. The clusterDataMR and clusterDataSeq, both overwrite the clusteredPoints when the input file provided has more than one paths, which is the case in the input of bottom level cluster. The test case does top level clustering, asserts cluster output processor, both of which works fine. Then it is asserting bottom level clustering which shows the problem. Only one point is written(overridden) in one cluster. This can be seen while debugging clusterDataSeq. > Top Down Clustering > ------------------- > > Key: MAHOUT-843 > URL: https://issues.apache.org/jira/browse/MAHOUT-843 > Project: Mahout > Issue Type: New Feature > Components: Clustering > Affects Versions: 0.6 > Reporter: Paritosh Ranjan > Labels: clustering, patch > Fix For: 0.6 > > Attachments: MAHOUT-843-patch, MAHOUT-843-patch-only-postprocessor, MAHOUT-843-patch-only-postprocessor-v1, MAHOUT-843-patch-v1, Top-Down-Clustering-patch > > > Top Down Clustering works in multiple steps. The first step is to find comparative bigger clusters. The second step is to cluster the bigger chunks into meaningful clusters. This can performance while clustering big amount of data. And, it also removes the dependency of providing input clusters/numbers to the clustering algorithm. > The "big" is a relative term, as well as the smaller "meaningful" terms. So, the control of this "bigger" and "smaller/meaningful" clusters will be controlled by the user. > Which clustering algorithm to be used in the top level and which to use in the bottom level can also be selected by the user. Initially, it can be done for only one/few clustering algorithms, and later, option can be provided to use all the algorithms ( which suits the case ). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira