mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marty Kube <>
Subject Re: OutOfMemoryError with BreimanExample on 4GB of data?
Date Fri, 21 Dec 2012 01:59:15 GMT
Hi Adam,

This is an interesting problem.  Increasing the heap size is not 
necessarily going to solve the issue.  The error you have:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit

Is due to to much time CPU time spent in GC, as opposed to not enough 
heap allocation.  Decreasing your heap allocation may in fact help as GC 
is more efficient on a smaller heap.  You may have to consider GC tuning.

On 12/20/2012 08:32 PM, Adam Baron wrote:
> I'm trying to run the org.apache.mahout.classifier.df.BreimanExample on a
> custom set of data that is ~4GB which has 500 Numerical Columns, 1
> Categorical Column with two possible label values and ~4 million rows.  I
> already ran the to generate
> the dataset *.info file.  However, despite bumping
> my up to -Xmx12288m, I still get this memory error
> below:
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
>          at
> sun.misc.FloatingDecimal.readJavaFormatString(
>          at java.lang.Double.parseDouble(
>          at
>          at
>          at
>          at
>          at
> org.apache.mahout.classifier.df.BreimanExample.main(
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>          at java.lang.reflect.Method.invoke(
>          at org.apache.hadoop.util.RunJar.main(
> I'm running on a pretty significant Hadoop cluster which has no problem
> running other sizable Mahout jobs such as K-Means Clustering on 100s GB
> n-gram TF/IDF files, so I'm thinking this is more of a configuration/code
> issue than a hardware issue.  The small example from the website
> ( worked flawlessly.
> I realize that if I decide to pursue Random Forest classification further,
> I'll need to write my own code to classify through a DecisionForest on a go
> forward basis (after the training set) since the BreimanExample is an
> example, not a tool.  However, for this initial foray I merely want to see
> what type of Test Error numbers my custom set of data would yield,
> preferably without writing any custom code.
> Any suggestions?
> Thanks,
>            Adam

View raw message