hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject direct buffer considered harmful
Date Fri, 27 Mar 2009 20:37:43 GMT
Hi all,

I ran into this on my TRUNK hbase setup:
java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory

The pertinent details of the stack trace are:
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
        at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:110)
        at
org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188)
        at
org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:120)
        at
org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getDecompressor(Compression.java:267)
        at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:871)

Ok, so what is this mysterious direct buffer and why am I dying?

This might be because I have 800 regions and 300+ gb of compressed hfiles.

So I looked at the ZlibDecompressor in hadoop, and it looks like there is
_no_ reason whatsoever to be using direct buffers.

A little background:
ByteBuffer offers 2 types of allocation: normal (backed by byte[]) and
'direct'.  The direct kind lives outside the scope of normal heap, and can
be passed via nio to the underlying OS possibly optimizing things.  But
there is only so much direct buffer space available, and you should only use
it if you are _sure_ you need to.  Furthermore there appears to be GC bugs
that doesn't let the JVM reclaim these buffers as quickly as it should - you
can go OOME without actually being OOME.

The hadoop compression library attempts to keep things under control by
reusing codecs and therefore the direct buffers.  But each codec uses
128kbytes of buffer and once you open too many, you go OOME.

I am not sure why the lib uses direct buffers.  We might be able to switch
it to not using direct buffers...

I think we should attempt to procure our own fast zlib-like compression
library that is not in hadoop however.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message