hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Interesting note on hbase client from asynchbase list --> Fwd: standard hbase client, asynchbase client, netty and direct memory buffers
Date Sat, 22 Oct 2011 08:14:21 GMT
Below is an interesting finding on our hbase client by Jonathan Payne.
  He posted the asynchbase list.   I'm forwarding here (with his
permission).

St.Ack


---------- Forwarded message ----------
From: Jonathan Payne <jpayne@flipboard.com>
Date: Fri, Oct 21, 2011 at 6:30 PM
Subject: standard hbase client, asynchbase client, netty and direct
memory buffers
To: AsyncHBase <asynchbase@googlegroups.com>


I thought I'd take a moment to explain what I discovered trying to
track down serious problems with the regular (non-async) hbase client
and Java's nio implementation.
We were having issues running out of direct memory and here's a stack
trace which says it all:
        java.nio.Buffer.<init>(Buffer.java:172)
        java.nio.ByteBuffer.<init>(ByteBuffer.java:259)
        java.nio.ByteBuffer.<init>(ByteBuffer.java:267)
        java.nio.MappedByteBuffer.<init>(MappedByteBuffer.java:64)
        java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:97)
        java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:155)
        sun.nio.ch.IOUtil.write(IOUtil.java:37)
        sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        java.io.DataOutputStream.flush(DataOutputStream.java:106)
        org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:518)
        org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
        org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        $Proxy11.getProtocolVersion(<Unknown Source>:Unknown line)
Here you can see that an HBaseClient request is flushing a stream
which has a socket channel at the other end of it. HBase has decided
not to use direct memory for its byte buffers which I thought was
smart since they are difficult to manage. Unfortunately, behind the
scenes the JDK is noticing the lack of direct memory buffer in the
socket channel write call, and it is allocating a direct memory buffer
on your behalf! The size of that direct memory buffer depends on the
amount of data you want to write at that time, so if you are writing
1M of data, the JDK will allocate 1M of direct memory.
The same is done on the reading side as well. If you perform channel
I/O with a non-direct memory buffer, the JDK will allocate a direct
memory buffer for you. In the reading case it allocates a size that
equals the amount of room you have in the direct memory buffer you
passed in to the read call. WTF!? That can be a very large value.
To make matters worse, the JDK caches these direct memory buffers in
thread local storage and it caches not one, but three of these
arbitrarily sized buffers. (Look in
sun.nio.ch.Util.getTemporaryDirectBuffer and let me know if I have
interpreted the code incorrectly.) So if you have a large number of
threads talking to hbase you can find yourself overflowing with direct
memory buffers that you have not allocated and didn't even know about.
This issue is what caused us to check out the asynchbase client, which
happily didn't have any of these problems. The reason is that
asynchbase uses netty and netty knows the proper way of using direct
memory buffers for I/O. The correct way is to use direct memory
buffers in manageable sizes, 16k to 64k or something like that, for
the purpose of invoking a read or write system call. Netty has
algorithms for calculating the best size given a particular socket
connection, based on the amount of data it seems to be able to read at
once, etc. Netty reads the data from the OS using direct memory and
copies that data into Java byte buffers.
Now you might be wondering why you don't just pass a regular Java byte
array into the read/write calls, to avoid the copy from direct memory
to java heap memory, and here's the story about that. Let's assume
you're doing a file or socket read. There are two cases:

If the amount being read is < 8k, it uses a native char array on the C
stack for the read system call, and then copies the result into your
Java buffer.
If the amount being read is > 8k, the JDK calls malloc, does the read
system call with that buffer, copies the result into your Java buffer,
and then calls free.

The reason for this is that the the compacting Java garbage collector
might move your Java buffer while you're blocked in the read system
call and clearly that will not do. But if you are not aware of the
malloc/free being called every time you perform a read larger than 8k,
you might be surprised by the performance of that. Direct memory
buffers were created to avoid the malloc/free every time you read. You
still need to copy but you don't need to malloc/free every time.
People get into trouble with direct memory because you cannot free
them up when you know you are done. Instead you need to wait for the
garbage collector to run and THEN the finalizers to be executed. You
can never tell when the GC will run and/or your finalizers be run, so
it's really a crap shoot. That's why the JDK caches these buffers
(that it shouldn't be creating in the first place). And the larger
your heap size, the less frequent the GCs. And actually, I saw some
code in the JDK which called System.gc() manually when a direct memory
buffer allocation failed, which is an absolute no-no. That might work
with small heap sizes but with large heap sizes, a full System.gc()
can take 15 or 20 seconds. We were trying to use the G1 collector
which allows for very large heap sizes without long GC delays, but
those delays were occurring because some piece of code was manually
running GC. When I disabled System.gc() with a command line option, we
ran out of direct memory instead.
This is long but I hope informative.

Mime
View raw message