hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: OutOfMemory Error
Date Fri, 19 Sep 2008 05:04:45 GMT
> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pallavip.05@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pallavip.05@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Mime
View raw message