mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen" <>
Subject Re: Memory problems with KMeans
Date Thu, 13 Nov 2008 17:29:05 GMT
You're not setting -Xmx1024m, for example, to increase the max heap
size (unless somehow these other options imply that.) Try that?

In general I'd say don't bother messing too much with the GC
parameters unless you're sure they're necessary. Especially in Java 6.
For instance I don't think you want 4 GC threads?

(I also throw on -da and -dsa to disable assertions when I care about speed.)

On Thu, Nov 13, 2008 at 4:53 PM, Philippe Lamarche
<> wrote:
> Hi,
> I am using KMeans to do some text clustering and I get into memory problems.
> As of now, I only tried it on a laptop in pseudo distributed master/slave
> mode.
> This is on Hadoop branch-0.19. The "texttovector.jar" contains a hacked
> version of the syntheticcontrol KMeans example, the only difference is in
> the first input phase.
> Is this memory error "normal"? I am running with export HADOOP_OPTS="-server
> -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:NewSize=1G -XX:MaxNewSize=1G
> -XX:-UseGCOverheadLimit"
> In my understanding, the "-XX:-UseGCOverheadLimit" should remove the
> GCOverhead "feature".
> Any ideas?

View raw message