mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Canopy Job failed processing, Error: Java heap space
Date Thu, 15 Mar 2012 02:18:55 GMT
Some reasons I can think of:

a) The vector dimension is really large.
b) Too many clusters i.e. cluster size is very small.

On 15-03-2012 07:39, WangRamon wrote:
> Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 9 segments, 136745366 bytes to disk to satisfy reduce memory limit
> 2012-03-15 09:51:40,818 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 136745354
bytes from disk
> 2012-03-15 09:51:40,819 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments,
0 bytes from memory into reduce
> 2012-03-15 09:51:40,819 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
> 2012-03-15 09:51:40,822 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 1 segments left of total size: 136745350 bytes
> 2012-03-15 10:03:25,273 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child
: java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
> 	at org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
> 	at org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139)
> 	at org.apache.mahout.math.RandomAccessSparseVector.assign(RandomAccessSparseVector.java:107)
> 	at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
> 	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:198)
> 	at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:158)
> 	at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:46)
> 	at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:29)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>  > From: ramon_wang@hotmail.com
>> To: user@mahout.apache.org
>> Subject: Canopy Job failed processing, Error: Java heap space
>> Date: Thu, 15 Mar 2012 09:49:57 +0800
>>
>>
>>
>>
>>
>> Hi All  I'm using the Canopy driver to find the cluster center points, the mapred.child.java.opts
parameter for Hadoop is set to 1024M, I'm processing 11000 records, I was supprised to got
the Java heap space error during cluster, did i miss something? Thanks.   BTW, i did succeed
for some tests with the same data set and configuration. Cheers  Ramon 		 	   		  
>  		 	   		  


Mime
View raw message