Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C7E6109BF for ; Tue, 7 Jan 2014 00:26:56 +0000 (UTC) Received: (qmail 91358 invoked by uid 500); 7 Jan 2014 00:26:56 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 91340 invoked by uid 500); 7 Jan 2014 00:26:56 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 91331 invoked by uid 99); 7 Jan 2014 00:26:56 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jan 2014 00:26:56 +0000 Date: Tue, 7 Jan 2014 00:26:55 +0000 (UTC) From: "Martin Illecker (JIRA)" To: dev@hama.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HAMA-834) Fix KMeans example MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Illecker resolved HAMA-834. ---------------------------------- Resolution: Fixed > Fix KMeans example > ------------------ > > Key: HAMA-834 > URL: https://issues.apache.org/jira/browse/HAMA-834 > Project: Hama > Issue Type: Bug > Components: examples, machine learning > Affects Versions: 0.6.3 > Reporter: Martin Illecker > Assignee: Martin Illecker > Labels: example > Fix For: 0.7.0 > > Attachments: HAMA-834.patch, HAMA-834_v02.patch, HAMA-834_v03.patch > > > Fix problems in KMeans example and revise test case. > 1) Typo \[1] and input path issue > 2) Wrong *summationCount* in assignCentersInternal > *summationCount* should also be incremented if \[2] > {code} > if (clusterCenter == null) { > newCenterArray[lowestDistantCenter] = key; > } > {code} > Otherwise *summationCount* may stay zero when only one value is assigned. Then this zero will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4]. > By the way if we add three vectors and the *summationCount* would only be two, this will lead to wrong results. Because later we are dividing the vector by the amount of increments. > 3) Results depend on the amount *numBspTask* > (results vary if *numBspTask* is changed) > \[1] > https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519 > \[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249 > \[3] > https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161 > \[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172 -- This message was sent by Atlassian JIRA (v6.1.5#6160)