mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Wang <wangfan...@gmail.com>
Subject Re: Naive Bayes testclassifier Java heap space exception
Date Thu, 09 Dec 2010 01:27:32 GMT
Hi David,

Thanks for your reply.

I just check, the docs are 39MB and models are 301MB. I'm running on a
single node with pseudo cluster setup. I think giving the hadoop worker 1GB
of memory should be more than enough.

Am I missing something here?


On Wed, Dec 8, 2010 at 3:24 AM, David Hagar <david@occamlaw.com> wrote:

> Hi Frank --
>
> One major caveat to the below: I've hacked the 0.4 distribution of
> Mahout quite a bit to get Naive Bayes running smoothly on Amazon's s3
> and elastic mapreduce services, thus my experience may not be typical
> and the memory problems I ran into might well be of my own making.
>
> That said I had to allocate between 2-3gb per map task to run Naive
> Bayes classification. The classification job loads pretty much every
> file in the training model into memory, so you can get some estimate
> of size by looking at the size of your model directory. Also, it did
> seem to me that each map task was holding onto each document it
> processed. So, each 100-150KB doc stays in memory after it has been
> classified.
>
> I temporarily resolved this by increasing the number of map tasks so
> that each task handled fewer documents, thus fewer documents stayed
> around in memory. Obviously a better fix would be to figure out why
> they are being held onto in the first place (or if the steady increase
> in memory was being introduced by something else).
>
> As I said, the problem may be with my local version of mahout, but
> that was my experience.
>
> -David
>
>
> On Wed, Dec 8, 2010 at 3:02 AM, Frank Wang <wangfanjie@gmail.com> wrote:
> > Hi, I was trying out Naive Bayes with a setup similar to 20NewsGroup
> setup.
> > There are 5 categories, each category with 150 articles, and each article
> is
> > about 50~150kb in size.
> >
> > Training was successful:
> > $MAHOUT_HOME/bin/mahout trainclassifier   -i news-input   -o news-model
> > -type bayes   -ng 3   -source hdfs
> >
> > However, Testing Classifier always generate this exception:
> > $MAHOUT_HOME/bin/mahout testclassifier   -m news-model   -d news-input
> > -type bayes   -ng 3   -source hdfs   -method mapreduce
> > http://pastie.org/1358465
> >
> > I tried to give more memory to map-reduce worker in conf/mapred-site.xml,
> > (tried 256m, 512m and 1G), but no luck.
> >  <property>
> >    <name>mapred.child.java.opts</name>
> >    <value>-Xmx1G</value>
> >  </property>
> >
> > In 'top', memory usage for 2 Java processes would rise up to 1.0GB and
> then
> > TestClassifier crashes.
> >
> > Are my articles too large in size?
> > Has anyone experienced this?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message