Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E90310B96 for ; Tue, 4 Mar 2014 11:06:27 +0000 (UTC) Received: (qmail 84995 invoked by uid 500); 4 Mar 2014 11:06:25 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 84833 invoked by uid 500); 4 Mar 2014 11:06:21 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 84820 invoked by uid 99); 4 Mar 2014 11:06:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2014 11:06:21 +0000 Date: Tue, 4 Mar 2014 11:06:21 +0000 (UTC) From: "yannis ats (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yannis ats updated MAHOUT-1431: ------------------------------- Description: Hi all, i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and i performed kmeans experiments with both versions in amazon EMR. What i found is that mahout 0.8 is faster than mahout 0.9 in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9 the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800. The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am baffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8 Is this normal?is this expected? thank you in advance for your time. was: Hi all, i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and i performed kmeans experiments with both versions in amazon EMR. What i found is that mahout 0.8 is faster than mahout 0.9 in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9 the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800. The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am buffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8 Is this normal?is this expected? thank you in advance for your time. > Comparison of Mahout 0.8 vs mahout 0.9 in EMR > --------------------------------------------- > > Key: MAHOUT-1431 > URL: https://issues.apache.org/jira/browse/MAHOUT-1431 > Project: Mahout > Issue Type: Question > Components: Clustering > Affects Versions: 0.8, 0.9 > Reporter: yannis ats > Labels: performance > > Hi all, > i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and > i performed kmeans experiments with both versions in amazon EMR. > What i found is that mahout 0.8 is faster than mahout 0.9 > in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9 > the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800. > The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am baffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8 > Is this normal?is this expected? > thank you in advance for your time. -- This message was sent by Atlassian JIRA (v6.2#6252)