mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1328) CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D MR parameters
Date Mon, 18 Nov 2013 21:03:21 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825775#comment-13825775
] 

Suneel Marthi commented on MAHOUT-1328:
---------------------------------------

[~stewh-uk]  Could u post the stacktrace for this?  Looking at the code it seems like the
user specified configuration is ignored while reading cluster policy (at the time of cluster
classification),  it would be good to look at the stacktrace to troubleshoot this.

> CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D
MR parameters
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1328
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1328
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Stewart Whiting
>            Assignee: Suneel Marthi
>             Fix For: 0.9
>
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce slots per
node and JVM memory parameters etc are not appropriate for the memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure
-x 12 -ow -k 50 -cl -Dmapred.child.java.opts=-Xmx7096m -Dmapred.tasktracker.reduce.tasks.maximum=1
-Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000 -Dmapred.cluster.max.map.memory.mb=7000
-Dmapred.cluster.reduce.memory.mb=7000 -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully. Inspecting the Hadoop
config for each task after completion show that the job runs with the explicitly provided
MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate the clusteredPoints/
directory), it usually fails due to outOfMemory errors. Inspecting the MR task logs for it
shows that it ran with the default cluster settings, not those provided by my -D CLI parameters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message