mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Canopy Job failed processing, Error: Java heap space
Date Thu, 15 Mar 2012 03:49:59 GMT
With Canopy this is a symptom of T2 being too large. This causes an
explosion of clusters - in the limit, one per input vector - and if
vector dimension is large too there is no amount of memory which can
hold them all for large datasets. Reduce T2 until you get a tractable
number of clusters, then mess with T1 which only affects their
sensitivities to each other.

On 3/14/12 8:18 PM, Paritosh Ranjan wrote:
> Some reasons I can think of:
>
> a) The vector dimension is really large.
> b) Too many clusters i.e. cluster size is very small.
>
> On 15-03-2012 07:39, WangRamon wrote:
>> Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 9 segments, 136745366 bytes to disk to satisfy reduce memory limit
>> 2012-03-15 09:51:40,818 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files,
136745354 bytes from disk
>> 2012-03-15 09:51:40,819 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments,
0 bytes from memory into reduce
>> 2012-03-15 09:51:40,819 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>> 2012-03-15 09:51:40,822 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 1 segments left of total size: 136745350 bytes
>> 2012-03-15 10:03:25,273 FATAL org.apache.hadoop.mapred.TaskTracker: Error running
child : java.lang.OutOfMemoryError: Java heap space
>> 	at org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
>> 	at org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>> 	at org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139)
>> 	at org.apache.mahout.math.RandomAccessSparseVector.assign(RandomAccessSparseVector.java:107)
>> 	at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
>> 	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:198)
>> 	at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:158)
>> 	at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:46)
>> 	at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:29)
>> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>  > From: ramon_wang@hotmail.com
>>> To: user@mahout.apache.org
>>> Subject: Canopy Job failed processing, Error: Java heap space
>>> Date: Thu, 15 Mar 2012 09:49:57 +0800
>>>
>>>
>>>
>>>
>>>
>>> Hi All  I'm using the Canopy driver to find the cluster center points, the mapred.child.java.opts
parameter for Hadoop is set to 1024M, I'm processing 11000 records, I was supprised to got
the Java heap space error during cluster, did i miss something? Thanks.   BTW, i did succeed
for some tests with the same data set and configuration. Cheers  Ramon 		 	   		  
>>  		 	   		  
>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message