mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR
Date Tue, 04 Mar 2014 11:22:24 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919267#comment-13919267
] 

Suneel Marthi commented on MAHOUT-1431:
---------------------------------------

Could u provide CODE snapshots of where u believe that the iterations r taking longer?

the only change that was done to address Mahout-1030 was convert all vectors to Named Vectors
to store the Vector Ids and vector distances from cluster centers. The code changes for that
are in ClusterClassificationDriver (for Sequential mode) and ClusterClassificationMapper (for
MR mode) which are post processing steps after clustering is done. 



> Comparison of Mahout 0.8 vs mahout 0.9 in EMR
> ---------------------------------------------
>
>                 Key: MAHOUT-1431
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1431
>             Project: Mahout
>          Issue Type: Question
>          Components: Clustering
>    Affects Versions: 0.8, 0.9
>            Reporter: yannis ats
>              Labels: performance
>
> Hi all,
> i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and 
> i performed kmeans experiments with both versions in amazon EMR.
> What i found is that mahout 0.8 is faster than mahout 0.9
> in particular i observed that mahout 0.8 is performing less iterations and every iteration
of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that
of 0.9
> the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints
with dimensionality of 1800.
> The input parameters in both experiments were exactly the same,modulo the initialization
which was random in both cases and i can understand that this may affect the convergence(the
amount of iterations),but i am baffled by the fact that every iteration takes almost twice
the time in 0.9 vs 0.8
> Is this normal?is this  expected?
> thank you in advance for your time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message