Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAB16115D7 for ; Sun, 13 Apr 2014 11:25:35 +0000 (UTC) Received: (qmail 7302 invoked by uid 500); 13 Apr 2014 11:25:27 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 5267 invoked by uid 500); 13 Apr 2014 11:25:21 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 4790 invoked by uid 99); 13 Apr 2014 11:25:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Apr 2014 11:25:20 +0000 Date: Sun, 13 Apr 2014 11:25:20 +0000 (UTC) From: "Sebastian Schelter (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967797#comment-13967797 ] Sebastian Schelter commented on MAHOUT-1431: -------------------------------------------- Any progess here? Otherwise I'll close the ticket soon. > Comparison of Mahout 0.8 vs mahout 0.9 in EMR > --------------------------------------------- > > Key: MAHOUT-1431 > URL: https://issues.apache.org/jira/browse/MAHOUT-1431 > Project: Mahout > Issue Type: Question > Components: Clustering > Affects Versions: 0.8, 0.9 > Reporter: yannis ats > Labels: performance > > Hi all, > i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and > i performed kmeans experiments with both versions in amazon EMR. > What i found is that mahout 0.8 is faster than mahout 0.9 > in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9 > the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800. > The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am baffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8 > Is this normal?is this expected? > thank you in advance for your time. -- This message was sent by Atlassian JIRA (v6.2#6252)