mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marty Kube <martyk...@beavercreekconsulting.com>
Subject Re: OutOfMemoryError with BreimanExample on 4GB of data?
Date Fri, 21 Dec 2012 01:59:15 GMT
Hi Adam,

This is an interesting problem.  Increasing the heap size is not 
necessarily going to solve the issue.  The error you have:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded

Is due to to much time CPU time spent in GC, as opposed to not enough 
heap allocation.  Decreasing your heap allocation may in fact help as GC 
is more efficient on a smaller heap.  You may have to consider GC tuning.


On 12/20/2012 08:32 PM, Adam Baron wrote:
> I'm trying to run the org.apache.mahout.classifier.df.BreimanExample on a
> custom set of data that is ~4GB which has 500 Numerical Columns, 1
> Categorical Column with two possible label values and ~4 million rows.  I
> already ran the org.apache.mahout.classifier.df.tools.Describe to generate
> the dataset *.info file.  However, despite bumping
> my mapred.child.java.opts up to -Xmx12288m, I still get this memory error
> below:
>
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
>          at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
>          at java.lang.Double.parseDouble(Double.java:510)
>          at
> org.apache.mahout.classifier.df.data.DataConverter.convert(DataConverter.java:64)
>          at
> org.apache.mahout.classifier.df.data.DataLoader.loadData(DataLoader.java:130)
>          at
> org.apache.mahout.classifier.df.BreimanExample.run(BreimanExample.java:187)
>          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>          at
> org.apache.mahout.classifier.df.BreimanExample.main(BreimanExample.java:125)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> I'm running on a pretty significant Hadoop cluster which has no problem
> running other sizable Mahout jobs such as K-Means Clustering on 100s GB
> n-gram TF/IDF files, so I'm thinking this is more of a configuration/code
> issue than a hardware issue.  The small glass.data example from the website
> (https://cwiki.apache.org/MAHOUT/breiman-example.html) worked flawlessly.
>
> I realize that if I decide to pursue Random Forest classification further,
> I'll need to write my own code to classify through a DecisionForest on a go
> forward basis (after the training set) since the BreimanExample is an
> example, not a tool.  However, for this initial foray I merely want to see
> what type of Test Error numbers my custom set of data would yield,
> preferably without writing any custom code.
>
> Any suggestions?
>
> Thanks,
>            Adam
>


Mime
View raw message