hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Singh Chouhan <abhishekchouhan...@gmail.com>
Subject Re: OutOfMemoryError: Direct buffer memory on PUT
Date Wed, 08 Nov 2017 11:31:07 GMT
I faced the same issue and have been debugging this for some time now(the
logging is not very helpful as daniel mentions :)).
Looking deeper into this i realized that the side effects also are large
incorrect byte buffer allocations on the server side apart from call
timeouts on the client side.
Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215> for
this

On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <djelinski1@gmail.com>
wrote:

> 2017-11-07 18:22 GMT+01:00 Stack <stack@duboce.net>:
>
> > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <djelinski1@gmail.com>
> > wrote:
> >
> > > For others that run into similar issue, it turned out that the
> > > OutOfMemoryError was thrown (and subsequently hidden) on the client
> side.
> > > The error was caused by excessive direct memory usage in Java NIO's
> > > bytebuffer caching (described here:
> > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > > -Djdk.nio.maxCachedBufferSize=262144
> > > allowed the application to complete.
> > >
> > >
> > Suggestions for how to expose the client-side OOME Daniel? We should add
> > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and
> > make sure the exception makes it out!)
> >
>
> Well I found the problem by adding printStackTrace to
> AsyncProcess.createLog function, which was responsible for logging the
> original OOME. This is not very elegant, and I wouldn't recommend adding it
> to the official codebase, but the stack trace offers some hints:
>
> java.io.IOException: com.google.protobuf.ServiceException:
> java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(
> ProtobufUtil.java:329)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:130)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:53)
>
>                         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:200)
>
>                         at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$
> SingleServerRequestRunnable.run(AsyncProcess.java:727)
>
>                         at
> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>
>                         at java.util.concurrent.FutureTask.run(Unknown
> Source)
>
>                         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>                         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
>                         at java.lang.Thread.run(Unknown Source)
>
> Caused by: com.google.protobuf.ServiceException:
> java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> AbstractRpcClient.java:240)
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$
> BlockingRpcChannelImplementation.callBlockingMethod(
> AbstractRpcClient.java:336)
>
>                         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> BlockingStub.multi(ClientProtos.java:34142)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:128)
>
>                         ... 8 more
>
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at java.nio.Bits.reserveMemory(Unknown Source)
>
>                         at java.nio.DirectByteBuffer.<init>(Unknown
> Source)
>
>                         at java.nio.ByteBuffer.allocateDirect(Unknown
> Source)
>
>                         at sun.nio.ch.Util.getTemporaryDirectBuffer(
> Unknown
> Source)
>
>                         at sun.nio.ch.IOUtil.write(Unknown Source)
>
>                         at sun.nio.ch.SocketChannelImpl.write(Unknown
> Source)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream$Writer.
> performIO(SocketOutputStream.java:63)
>
>                         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> SocketIOWithTimeout.java:142)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream.write(
> SocketOutputStream.java:159)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream.write(
> SocketOutputStream.java:117)
>
>                         at
> org.apache.hadoop.security.SaslOutputStream.write(
> SaslOutputStream.java:169)
>
>                         at java.io.BufferedOutputStream.write(Unknown
> Source)
>
>                         at java.io.DataOutputStream.write(Unknown Source)
>
>                         at
> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
>
>                         at
> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> writeRequest(RpcClientImpl.java:921)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(
> RpcClientImpl.java:874)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> AbstractRpcClient.java:227)
>
>                         ... 11 more
> This stack trace comes from cdh5.10.2 version, but the master branch is
> sufficiently similar. So, depending on what we want to achieve, we could:
> - just replace catch(Throwable e) in AbstractRpcClient.callBlockingMethod
> with something more fine-grained and fail the application
> - or forward OOME in callBlockingMethod, but add information about
> maxCachedBufferSize,
> also failing the application but suggesting possible corrective action to
> the user
> - or pass the error to the user, allowing the application to intercept it.
> Not sure yet how to do that, but we would need to do something about the
> connection becoming unusable after OOME, in case user decides to keep
> going.
> What's your take?
>
>
>
>
> > Thanks for updating the list,
> > S
> >
> >
> >
> > > Yet another proof that correct handling of OOME is hard.
> > > Thanks,
> > > Daniel
> > >
> > > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <djelinski1@gmail.com>:
> > >
> > > > Thanks for the hints. I'll see if we can explicitly set
> > > > MaxDirectMemorySize to a safe number.
> > > > Thanks,
> > > > Daniel
> > > >
> > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <esteban@cloudera.com>:
> > > >
> > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > > >> classes/sun/misc/VM.java#l184
> > > >>
> > > >>     // The initial value of this field is arbitrary; during JRE
> > > >> initialization
> > > >>     // it will be reset to the value specified on the command line,
> if
> > > >> any,
> > > >>     // otherwise to Runtime.getRuntime().maxMemory().
> > > >>
> > > >> which goes all the way down to memory/heap.cpp to whatever was left
> to
> > > the
> > > >> reserved memory depending on the flags and the platform used as
> > Vladimir
> > > >> says.
> > > >>
> > > >> Also, depending on which distribution and features are used there
> are
> > > >> specific guidelines about setting that parameter so mileage might
> > vary.
> > > >>
> > > >> thanks,
> > > >> esteban.
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Cloudera, Inc.
> > > >>
> > > >>
> > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > > >> vladrodionov@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > >> The default value is zero, which means the maximum direct
> memory
> > is
> > > >> > unbounded.
> > > >> >
> > > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > > default
> > > >> is
> > > >> > platform specific
> > > >> >
> > > >> > The link above is for JRockit JVM I presume?
> > > >> >
> > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > > >> esteban@cloudera.com>
> > > >> > wrote:
> > > >> >
> > > >> > > I don't think is truly unbounded, IIRC it s limited to the
> maximum
> > > >> > > allocated heap.
> > > >> > >
> > > >> > > thanks,
> > > >> > > esteban.
> > > >> > >
> > > >> > > --
> > > >> > > Cloudera, Inc.
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> > optionxx.
> > > >> htm :
> > > >> > > >
> > > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > > >> > > >
> > > >> > > > Default Value
> > > >> > > >
> > > >> > > > The default value is zero, which means the maximum
direct
> memory
> > > is
> > > >> > > > unbounded.
> > > >> > > >
> > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov
<
> > > >> > > > vladrodionov@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > >> XXMaxDirectMemorySize is set to the default
0, which
> means
> > > >> > unlimited
> > > >> > > > as
> > > >> > > > > far
> > > >> > > > > >> as I can tell.
> > > >> > > > >
> > > >> > > > > Not sure if this is true. The only conforming
that link I
> > found
> > > >> was
> > > >> > for
> > > >> > > > > JRockit JVM.
> > > >> > > > >
> > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński
<
> > > >> > djelinski1@gmail.com
> > > >> > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Vladimir,
> > > >> > > > > > XXMaxDirectMemorySize is set to the default
0, which means
> > > >> > unlimited
> > > >> > > as
> > > >> > > > > far
> > > >> > > > > > as I can tell.
> > > >> > > > > > Thanks,
> > > >> > > > > > Daniel
> > > >> > > > > >
> > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov
<
> > > >> > vladrodionov@gmail.com
> > > >> > > >:
> > > >> > > > > >
> > > >> > > > > > > Have you try to increase direct memory
size for server
> > > >> process?
> > > >> > > > > > > -XXMaxDirectMemorySize=?
> > > >> > > > > > >
> > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel
Jeliński <
> > > >> > > > djelinski1@gmail.com>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hello,
> > > >> > > > > > > > I'm running an application doing
a lot of Puts (size
> > > >> anywhere
> > > >> > > > > between 0
> > > >> > > > > > > and
> > > >> > > > > > > > 10MB, one cell at a time); occasionally
I'm getting an
> > > error
> > > >> > like
> > > >> > > > the
> > > >> > > > > > > > below:
> > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess]
- #13368,
> > > >> > > > > > > > table=researchplatform:repo_stripe,
attempt=1/1
> > > >> failed=1ops,
> > > >> > > last
> > > >> > > > > > > > exception: java.io.IOException:
com.google.protobuf.
> > > >> > > > > ServiceException:
> > > >> > > > > > > > java.lang.OutOfMemoryError: Direct
buffer memory on
> > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534,
> tracking
> > > >> > started
> > > >> > > > Mon
> > > >> > > > > > Oct
> > > >> > > > > > > 09
> > > >> > > > > > > > 04:29:29 EDT 2017; not retrying
1 - final failure
> > > >> > > > > > > >
> > > >> > > > > > > > After that the connection to RegionServer
becomes
> > > unusable.
> > > >> > Every
> > > >> > > > > > > > subsequent attempt to execute Put
on that connection
> > > >> results in
> > > >> > > > > > > > CallTimeoutException. I only found
the OutOfMemory by
> > > >> reducing
> > > >> > > the
> > > >> > > > > > number
> > > >> > > > > > > > of tries to 1.
> > > >> > > > > > > >
> > > >> > > > > > > > The host running HBase appears
to have at least a few
> GB
> > > of
> > > >> > free
> > > >> > > > > memory
> > > >> > > > > > > > available. Server logs do not mention
anything about
> > this
> > > >> > error.
> > > >> > > > > > Cluster
> > > >> > > > > > > is
> > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > > >> > > > > > > >
> > > >> > > > > > > > Is this a known problem? Are there
workarounds
> > available?
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Daniel
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message