hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asaf Mesika <asaf.mes...@gmail.com>
Subject Re: OutOfMemoryError in MapReduce Job
Date Sat, 02 Nov 2013 14:37:10 GMT
I would try to compress this bit set.

On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com> wrote:

> Hi,
> 
> thanks for your answer! I increase the "Map Task Maximum Heap Size" to 2gb
> and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> server are now crashing all the time :-/ I try to store the bitvector
> (120mb in size) for some rows. This seems to be very memory intensive, the
> usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> reading or the writing task which causes this, but I thnk its the writing
> task. Any idea how to minimize the memory usage? My mapper looks like this:
> 
> public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
> 
> private void storeBitvectorToHBase(
>        Put row = new Put(name);
>        row.setWriteToWAL(false);
>        row.add(cf,    Bytes.toBytes("columname"), toByteArray(bitvector));
>        ImmutableBytesWritable key = new ImmutableBytesWritable(
>                name);
>        context.write(key, row);
> }
> }
> 
> 
> kind regards
> 
> 
> 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> 
>> Ho John,
>> 
>> You might be better to ask this on the CDH mailing list since it's more
>> related to Cloudera Manager than HBase.
>> 
>> In the meantime, can you try to update the "Map Task Maximum Heap Size"
>> parameter too?
>> 
>> JM
>> 
>> 
>> 2013/11/1 John <johnnyenglish739@gmail.com>
>> 
>>> Hi,
>>> 
>>> I have a problem with the memory. My use case is the following: I've
>> crated
>>> a MapReduce-job and iterate in this over every row. If the row has more
>>> than for example 10k columns I will create a bloomfilter (a bitSet) for
>>> this row and store it in the hbase structure. This worked fine so far.
>>> 
>>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
>> size.
>>> In every map()-function there exist 2 BitSet. If i try to execute the
>>> MR-job I got this error: http://pastebin.com/DxFYNuBG
>>> 
>>> Obviously, the tasktracker does not have enougth memory. I try to adjust
>>> the configuration for the memory, but I'm not sure which is the right
>> one.
>>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
>> 1GB
>>> to 2GB, but still got the same error.
>>> 
>>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
>>> Clouder Manager
>>> 
>>> kind regards
>>> 
>> 


Mime
View raw message