Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7524B200D33 for ; Wed, 8 Nov 2017 12:31:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 73764160BE0; Wed, 8 Nov 2017 11:31:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 69D6D160BDA for ; Wed, 8 Nov 2017 12:31:23 +0100 (CET) Received: (qmail 32712 invoked by uid 500); 8 Nov 2017 11:31:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 32700 invoked by uid 99); 8 Nov 2017 11:31:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Nov 2017 11:31:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 074811807E0 for ; Wed, 8 Nov 2017 11:31:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.629 X-Spam-Level: *** X-Spam-Status: No, score=3.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JKECuBLFMFHx for ; Wed, 8 Nov 2017 11:31:16 +0000 (UTC) Received: from mail-ot0-f177.google.com (mail-ot0-f177.google.com [74.125.82.177]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 546555FD83 for ; Wed, 8 Nov 2017 11:31:16 +0000 (UTC) Received: by mail-ot0-f177.google.com with SMTP id n74so1993992ota.8 for ; Wed, 08 Nov 2017 03:31:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Q6fVvaRlFxCJsc2eTfuxgv2xzRPgo9ceMCJwGMWMfzA=; b=rZyCcYhdXXmetFWN1BTX2lutnxXEdbEs1mVHjNZSFTQxpBFfYGJ3rqIdDa8XC4jkm/ ohzrHrcUL8il7KfsZDf/AV5IyjzimFRvz5BtPPA14hzJ/4w/xwHZCbLeroJfwK8tfxjZ 3bGr9iu5fWAQgcLfw0Ivu7pdU2b/fn8A51JxfHcqeizBBk5/oFY9qtp7e+kscI2eTQER eMyDXDsGFW3se5HK+dhxL1fRZW15IxXpgdoiPxw907o3htwu+zFJGIEZ0WgBRXFw5Sjr xmaqkfn1GBDHWlqQzDWG7WS7jvs3/h9W1LQ7UmovlpLFA51HCMBYW3Kim4/nHIcMWeXy ovZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Q6fVvaRlFxCJsc2eTfuxgv2xzRPgo9ceMCJwGMWMfzA=; b=Kw9b1v+60YWOWltywudVouYWX+0tu8dXx3L4HTBpJRSCW9NlOOepHkzv6ZFyHtvapM jegdj2tiLBcFoksOEWshnPqnHmJzQN/4irX5E1cXviePLvrJUTxZI3OjgA+8RjFW67KL 7X0oyyoQR5/3xd91PUcNeKJwAVwrSWSuvLtwIYXXFY8caD+96rC9s+c6RxOfGimuDtcY bUDi7jdB9KS4dI/6eX+IYd6YG2HH7HhhIfaXGtQyGysIziAf1Vs+O1W6AaQxKHWZmDt5 iIEIXXKMSSRzspR+mxREJmOGG/6mf+JXTLESa1nmzSm4DZwFQmRjCALYX178DuQZObwg uSMA== X-Gm-Message-State: AJaThX6KFSMCCdUppM3rPdTnZHBsIBrub5G6SZlJL1SruGznfhGp28TX rH07GmVKvdB15hMxcE7M83GM+iMRJvgcehCk9mNLKQ== X-Google-Smtp-Source: AGs4zMa2ua2cWU4uIPMrZcH8Eb7P0NLuWj8W5YskgawgbUeUUgjeVugDgBF8DuYYYLDAtbi6AIVC+x5Z4JifcTkOfMA= X-Received: by 10.157.48.130 with SMTP id s2mr96013otc.418.1510140669812; Wed, 08 Nov 2017 03:31:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.177.67 with HTTP; Wed, 8 Nov 2017 03:31:07 -0800 (PST) In-Reply-To: References: From: Abhishek Singh Chouhan Date: Wed, 8 Nov 2017 17:01:07 +0530 Message-ID: Subject: Re: OutOfMemoryError: Direct buffer memory on PUT To: Hbase-User Content-Type: multipart/alternative; boundary="f4030435c5ec2f6519055d7706e3" archived-at: Wed, 08 Nov 2017 11:31:24 -0000 --f4030435c5ec2f6519055d7706e3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I faced the same issue and have been debugging this for some time now(the logging is not very helpful as daniel mentions :)). Looking deeper into this i realized that the side effects also are large incorrect byte buffer allocations on the server side apart from call timeouts on the client side. Have filed HBASE-19215 = for this On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeli=C5=84ski wrote: > 2017-11-07 18:22 GMT+01:00 Stack : > > > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeli=C5=84ski > > wrote: > > > > > For others that run into similar issue, it turned out that the > > > OutOfMemoryError was thrown (and subsequently hidden) on the client > side. > > > The error was caused by excessive direct memory usage in Java NIO's > > > bytebuffer caching (described here: > > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting > > > -Djdk.nio.maxCachedBufferSize=3D262144 > > > allowed the application to complete. > > > > > > > > Suggestions for how to expose the client-side OOME Daniel? We should ad= d > > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and > > make sure the exception makes it out!) > > > > Well I found the problem by adding printStackTrace to > AsyncProcess.createLog function, which was responsible for logging the > original OOME. This is not very elegant, and I wouldn't recommend adding = it > to the official codebase, but the stack trace offers some hints: > > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.OutOfMemoryError: Direct buffer memory > > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException( > ProtobufUtil.java:329) > > at > org.apache.hadoop.hbase.client.MultiServerCallable. > call(MultiServerCallable.java:130) > > at > org.apache.hadoop.hbase.client.MultiServerCallable. > call(MultiServerCallable.java:53) > > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries( > RpcRetryingCaller.java:200) > > at > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$ > SingleServerRequestRunnable.run(AsyncProcess.java:727) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > > at java.util.concurrent.FutureTask.run(Unknown > Source) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > > at java.lang.Thread.run(Unknown Source) > > Caused by: com.google.protobuf.ServiceException: > java.lang.OutOfMemoryError: Direct buffer memory > > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > AbstractRpcClient.java:240) > > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$ > BlockingRpcChannelImplementation.callBlockingMethod( > AbstractRpcClient.java:336) > > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$ > BlockingStub.multi(ClientProtos.java:34142) > > at > org.apache.hadoop.hbase.client.MultiServerCallable. > call(MultiServerCallable.java:128) > > ... 8 more > > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > > at java.nio.Bits.reserveMemory(Unknown Source) > > at java.nio.DirectByteBuffer.(Unknown > Source) > > at java.nio.ByteBuffer.allocateDirect(Unknown > Source) > > at sun.nio.ch.Util.getTemporaryDirectBuffer( > Unknown > Source) > > at sun.nio.ch.IOUtil.write(Unknown Source) > > at sun.nio.ch.SocketChannelImpl.write(Unknown > Source) > > at > org.apache.hadoop.net.SocketOutputStream$Writer. > performIO(SocketOutputStream.java:63) > > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO( > SocketIOWithTimeout.java:142) > > at > org.apache.hadoop.net.SocketOutputStream.write( > SocketOutputStream.java:159) > > at > org.apache.hadoop.net.SocketOutputStream.write( > SocketOutputStream.java:117) > > at > org.apache.hadoop.security.SaslOutputStream.write( > SaslOutputStream.java:169) > > at java.io.BufferedOutputStream.write(Unknown > Source) > > at java.io.DataOutputStream.write(Unknown Source) > > at > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277) > > at > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266) > > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection. > writeRequest(RpcClientImpl.java:921) > > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest( > RpcClientImpl.java:874) > > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243) > > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > AbstractRpcClient.java:227) > > ... 11 more > This stack trace comes from cdh5.10.2 version, but the master branch is > sufficiently similar. So, depending on what we want to achieve, we could: > - just replace catch(Throwable e) in AbstractRpcClient.callBlockingMethod > with something more fine-grained and fail the application > - or forward OOME in callBlockingMethod, but add information about > maxCachedBufferSize, > also failing the application but suggesting possible corrective action to > the user > - or pass the error to the user, allowing the application to intercept it= . > Not sure yet how to do that, but we would need to do something about the > connection becoming unusable after OOME, in case user decides to keep > going. > What's your take? > > > > > > Thanks for updating the list, > > S > > > > > > > > > Yet another proof that correct handling of OOME is hard. > > > Thanks, > > > Daniel > > > > > > 2017-10-11 11:33 GMT+02:00 Daniel Jeli=C5=84ski : > > > > > > > Thanks for the hints. I'll see if we can explicitly set > > > > MaxDirectMemorySize to a safe number. > > > > Thanks, > > > > Daniel > > > > > > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez = : > > > > > > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/ > > > >> classes/sun/misc/VM.java#l184 > > > >> > > > >> // The initial value of this field is arbitrary; during JRE > > > >> initialization > > > >> // it will be reset to the value specified on the command line= , > if > > > >> any, > > > >> // otherwise to Runtime.getRuntime().maxMemory(). > > > >> > > > >> which goes all the way down to memory/heap.cpp to whatever was lef= t > to > > > the > > > >> reserved memory depending on the flags and the platform used as > > Vladimir > > > >> says. > > > >> > > > >> Also, depending on which distribution and features are used there > are > > > >> specific guidelines about setting that parameter so mileage might > > vary. > > > >> > > > >> thanks, > > > >> esteban. > > > >> > > > >> > > > >> > > > >> -- > > > >> Cloudera, Inc. > > > >> > > > >> > > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov < > > > >> vladrodionov@gmail.com> > > > >> wrote: > > > >> > > > >> > >> The default value is zero, which means the maximum direct > memory > > is > > > >> > unbounded. > > > >> > > > > >> > That is not correct. If you do not specify MaxDirectMemorySize, > > > default > > > >> is > > > >> > platform specific > > > >> > > > > >> > The link above is for JRockit JVM I presume? > > > >> > > > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez < > > > >> esteban@cloudera.com> > > > >> > wrote: > > > >> > > > > >> > > I don't think is truly unbounded, IIRC it s limited to the > maximum > > > >> > > allocated heap. > > > >> > > > > > >> > > thanks, > > > >> > > esteban. > > > >> > > > > > >> > > -- > > > >> > > Cloudera, Inc. > > > >> > > > > > >> > > > > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu > > > wrote: > > > >> > > > > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/ > > optionxx. > > > >> htm : > > > >> > > > > > > >> > > > java -XX:MaxDirectMemorySize=3D2g myApp > > > >> > > > > > > >> > > > Default Value > > > >> > > > > > > >> > > > The default value is zero, which means the maximum direct > memory > > > is > > > >> > > > unbounded. > > > >> > > > > > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov < > > > >> > > > vladrodionov@gmail.com> > > > >> > > > wrote: > > > >> > > > > > > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which > means > > > >> > unlimited > > > >> > > > as > > > >> > > > > far > > > >> > > > > >> as I can tell. > > > >> > > > > > > > >> > > > > Not sure if this is true. The only conforming that link I > > found > > > >> was > > > >> > for > > > >> > > > > JRockit JVM. > > > >> > > > > > > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeli=C5=84ski < > > > >> > djelinski1@gmail.com > > > >> > > > > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Vladimir, > > > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which mea= ns > > > >> > unlimited > > > >> > > as > > > >> > > > > far > > > >> > > > > > as I can tell. > > > >> > > > > > Thanks, > > > >> > > > > > Daniel > > > >> > > > > > > > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov < > > > >> > vladrodionov@gmail.com > > > >> > > >: > > > >> > > > > > > > > >> > > > > > > Have you try to increase direct memory size for server > > > >> process? > > > >> > > > > > > -XXMaxDirectMemorySize=3D? > > > >> > > > > > > > > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeli=C5=84ski < > > > >> > > > djelinski1@gmail.com> > > > >> > > > > > > wrote: > > > >> > > > > > > > > > >> > > > > > > > Hello, > > > >> > > > > > > > I'm running an application doing a lot of Puts (size > > > >> anywhere > > > >> > > > > between 0 > > > >> > > > > > > and > > > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting = an > > > error > > > >> > like > > > >> > > > the > > > >> > > > > > > > below: > > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN [AsyncProcess] - #1336= 8, > > > >> > > > > > > > table=3Dresearchplatform:repo_stripe, attempt=3D1/1 > > > >> failed=3D1ops, > > > >> > > last > > > >> > > > > > > > exception: java.io.IOException: com.google.protobuf. > > > >> > > > > ServiceException: > > > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on > > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, > tracking > > > >> > started > > > >> > > > Mon > > > >> > > > > > Oct > > > >> > > > > > > 09 > > > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure > > > >> > > > > > > > > > > >> > > > > > > > After that the connection to RegionServer becomes > > > unusable. > > > >> > Every > > > >> > > > > > > > subsequent attempt to execute Put on that connection > > > >> results in > > > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory b= y > > > >> reducing > > > >> > > the > > > >> > > > > > number > > > >> > > > > > > > of tries to 1. > > > >> > > > > > > > > > > >> > > > > > > > The host running HBase appears to have at least a fe= w > GB > > > of > > > >> > free > > > >> > > > > memory > > > >> > > > > > > > available. Server logs do not mention anything about > > this > > > >> > error. > > > >> > > > > > Cluster > > > >> > > > > > > is > > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2. > > > >> > > > > > > > > > > >> > > > > > > > Is this a known problem? Are there workarounds > > available? > > > >> > > > > > > > Thanks, > > > >> > > > > > > > Daniel > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > --f4030435c5ec2f6519055d7706e3--