hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John <johnnyenglish...@gmail.com>
Subject Re: OutOfMemoryError in MapReduce Job
Date Sat, 02 Nov 2013 18:01:59 GMT
@Ted: okay, thanks for the information

@Asaf: It seems to work if I compress the bytes by my self. I use snappy
for that ( https://code.google.com/p/snappy/ ). The 120mb BitSet is
compressed to a 5mb  byte array. So  far the hbase server did not crashed.
Thanks!

kind regards


2013/11/2 Ted Yu <yuzhihong@gmail.com>

> Compression happens on server.
> See src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java (0.94)
>
> In 0.96 and beyond, see http://hbase.apache.org/book.html#rpc.configs
>
> Cheers
>
> On Sat, Nov 2, 2013 at 9:46 AM, John <johnnyenglish739@gmail.com> wrote:
>
> > You mean I should use the BitSet, transform it into bytes and then
> compress
> > it by my own in the map-function? Hmmm ... I could try it. What is the
> best
> > way to compress it in java?
> >
> > BTW. I'm not sure how exactly the hbase compression works. As I
> mentioned I
> > have allready enabled the LZO compression for the columnfamily. The
> > question is, where the bytes are compressed? Directly in the map-function
> > (If no, is it possible to compress it there with lzo?!) or in the region
> > server?
> >
> > kind regards
> >
> >
> > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com>
> >
> > > If mean, if you take all those bytes if the bit set and zip them,
> > wouldn't
> > > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > > etc.
> > >
> > > On Saturday, November 2, 2013, John wrote:
> > >
> > > > I already use LZO compression in HBase. Or do you mean a compressed
> > Java
> > > > object? Do you know an implementation?
> > > >
> > > > kind regards
> > > >
> > > >
> > > > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> > > >
> > > > > I would try to compress this bit set.
> > > > >
> > > > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for your answer! I increase the "Map Task Maximum Heap
> Size"
> > > to
> > > > > 2gb
> > > > > > and it seems to work. The OutOfMemoryEroror is gone. But the
> HBase
> > > > Region
> > > > > > server are now crashing all the time :-/ I try to store the
> > bitvector
> > > > > > (120mb in size) for some rows. This seems to be very memory
> > > intensive,
> > > > > the
> > > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if
it is
> > the
> > > > > > reading or the writing task which causes this, but I thnk its
the
> > > > writing
> > > > > > task. Any idea how to minimize the memory usage? My mapper looks
> > like
> > > > > this:
> > > > > >
> > > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > > Put>
> > > > {
> > > > > >
> > > > > > private void storeBitvectorToHBase(
> > > > > >        Put row = new Put(name);
> > > > > >        row.setWriteToWAL(false);
> > > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > > toByteArray(bitvector));
> > > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > > >                name);
> > > > > >        context.write(key, row);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > >
> > > > > > kind regards
> > > > > >
> > > > > >
> > > > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > <javascript:;>>
> > > > > >
> > > > > >> Ho John,
> > > > > >>
> > > > > >> You might be better to ask this on the CDH mailing list
since
> it's
> > > > more
> > > > > >> related to Cloudera Manager than HBase.
> > > > > >>
> > > > > >> In the meantime, can you try to update the "Map Task Maximum
> Heap
> > > > Size"
> > > > > >> parameter too?
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >>
> > > > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have a problem with the memory. My use case is the
following:
> > > I've
> > > > > >> crated
> > > > > >>> a MapReduce-job and iterate in this over every row.
If the row
> > has
> > > > more
> > > > > >>> than for example 10k columns I will create a bloomfilter
(a
> > bitSet)
> > > > for
> > > > > >>> this row and store it in the hbase structure. This worked
fine
> so
> > > > far.
> > > > > >>>
> > > > > >>> BUT, now I try to store a BitSet with 1000000000 elements
=
> > ~120mb
> > > in
> > > > > >> size.
> > > > > >>> In every map()-function there exist 2 BitSet. If i try
to
> execute
> > > the
> > > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > > > >>>
> > > > > >>> Obviously, the tasktracker does not have enougth memory.
I try
> to
> > > > > adjust
> > > > > >>> the configuration for the memory, but I'm not sure which
is the
> > > right
> > > > > >> one.
> > > > > >>> I try to change the "MapReduce Child Java Maximum Heap
Size"
> > value
> > > > from
> > > > > >> 1GB
> > > > > >>> to 2GB, but still got the same error.
> > > > > >>>
> > > > > >>> Which parameters do I have to adjust? BTW. I'm using
CDH 4.4.0
> > with
> > > > the
> > > > > >>> Clouder Manager
> > > > > >>>
> > > > > >>> kind regards
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message