Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 971BC787F for ; Sun, 2 Oct 2011 18:36:49 +0000 (UTC) Received: (qmail 85905 invoked by uid 500); 2 Oct 2011 18:36:49 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 85862 invoked by uid 500); 2 Oct 2011 18:36:49 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 85853 invoked by uid 99); 2 Oct 2011 18:36:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 18:36:49 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [93.94.224.195] (HELO owa.exchange-login.net) (93.94.224.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 18:36:42 +0000 Received: from HC2.hosted.exchange-login.net (93.94.224.201) by edge2.hosted.exchange-login.net (93.94.224.195) with Microsoft SMTP Server (TLS) id 14.1.339.1; Sun, 2 Oct 2011 20:36:21 +0200 Received: from [192.168.1.101] (182.68.179.156) by hc2.hosted.exchange-login.net (93.94.224.204) with Microsoft SMTP Server (TLS) id 14.1.339.1; Sun, 2 Oct 2011 20:36:18 +0200 Message-ID: <4E88AF16.6010903@xebia.com> Date: Mon, 3 Oct 2011 00:06:06 +0530 From: Paritosh Ranjan User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: Subject: CanopyDriver : run : clusterFilter : bug References: <4E876286.5040205@xebia.com> <4E88228C.6040902@xebia.com> In-Reply-To: <4E88228C.6040902@xebia.com> Content-Type: multipart/alternative; boundary="------------090801090003070909080804" X-Originating-IP: [182.68.179.156] --------------090801090003070909080804 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit The new parameter, clusterFilter, in CanopyDriver's run method, is not working properly. This is because, in ClusterMapper's findClosestCanopy method, the if condition protected Canopy findClosestCanopy(Vector point, Iterable canopies) { ... // find closest canopy for (Canopy canopy : canopies) { double dist = measure.distance(canopy.getCenter().getLengthSquared(), canopy.getCenter(), point); if (*dist< minDist*) { ... } } should be replaced with, if (*dist < minDist && dist <= t1 *) Otherwise, all records get the same canopy. This fix also needs some null pointer checks. I have fixed it, and got it working. I will try to provide the patch with a test case which reproduces the issue. Thanks and Regards, Paritosh Ranjan On 02-10-2011 14:06, Paritosh Ranjan wrote: > Even run() of CanopyDriver, which takes only T1 and T2 is giving > different results for sequential and mapreduce. > This is preventing me from scaling up, as I need to run mapreduce on > hadoop to scale. > > Is anyone having any idea of this problem? > > On 02-10-2011 00:27, Paritosh Ranjan wrote: >> Hi, >> >> I am able to cluster correctly sequentially, using CanopyDriver. >> >> However, the same dataset, when processed as a MapReduce job, where ( >> t1 = t3 and t2 = t4 and t1>t2) is not working. I am getting errors >> like Canopies are empty. >> >> I also tried to reduce the values of t3 and t4. But reducing it >> either has no effect or gives meaningless results. >> >> Am I doing something wrong? or is there a bug somewhere? >> >> I feel that both, sequential and MapReduce should give similar >> results. But, It is not happening. >> >> Thanks and Regards, >> Paritosh >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11 > > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11 --------------090801090003070909080804--