mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From james q <james.quacine...@gmail.com>
Subject Re: Problem running twenty newsgroup example in a hadoop cluster
Date Fri, 04 Feb 2011 04:26:11 GMT
Hey,

Did you ever figure this issue out?

>From my experience with Hadoop, you can optimize memory usage in your
cluster. From
http://getsatisfaction.com/cloudera/topics/how_much_ram_datanode_should_take,
HADOOP_HEAP_SIZE sets the size of the hadoop daemons (datanode,
tasktracker) and mapred.child.java.opts helps controls the heap size of
children JVMs (the map and reduce tasks themselves).

So maybe you could set HADOOP_HEAD_SIZE to 1Gb and the
mapred.child.java.opts=-Xmx3072M (3Gb). That way your map tasks have more
memory to work with?

> -- james


On Mon, Jan 24, 2011 at 9:54 PM, Jia Rao <rickenrao@gmail.com> wrote:

> Hi all,
>
> I am having a problem running the 20 newsgroup example in a hadoop cluster.
> The trainclassifier worked fine but I got "out of memory java heap" problem
> in the testclassifier.
>
> The following is the configuration of the hadoop cluster.
>
> Physical machines: 4 nodes, each with 6GB memory.
>
> Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
> mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
>
> mahout: tried release 0.4 and the latest source, same problem.
>
> Command line arguments used:
>
> $MAHOUT_HOME/bin/mahout testclassifier \
>  -m newsmodel \
>  -d 20news-input \
>  -type bayes \
>  -ng 3 \
>  -source hdfs \
>  -method mapreduce
>
>
> Any ideas ?
> Thanks !
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message