Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 788A511303 for ; Fri, 16 May 2014 22:39:03 +0000 (UTC) Received: (qmail 43854 invoked by uid 500); 16 May 2014 11:55:26 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 88950 invoked by uid 500); 16 May 2014 11:49:33 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 5646 invoked by uid 99); 16 May 2014 11:23:23 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 11:23:23 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of geovanie.marquez@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qc0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 May 2014 13:47:15 +0000 Received: by mail-qc0-f178.google.com with SMTP id l6so1818429qcy.9 for ; Thu, 15 May 2014 06:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rZDzwXibPlCm74f6EWN3WylIom4BzQ/8wr2/vCyDrWU=; b=nDZqh/EiLxQe/r0wclzduMZNi8OaWh7VeYgU3vLit/ka0bCFlhSG3A7oqfrzmjCtrA U5HawcpDjOqfrDouRMttvZFOAFFsDCV3pgyaG9oZVKwlQR1eTsvLDsXqLK+FuEMDn5lt zoPnaPbQzHUM92s8szKRm2N/zvZZRvF3csexm+T9v25TSaPQfIp0uCiYRD9lXX+VouvZ mhKLV5nU7hmZ02+lyJ8tpcoWVHShJHw7zeImNCMHMD64tmNnGetY3Qidk2cgf5H95utv wFKR9rTMkCvt4J+gQBvQdHwg08kJgElaS+NXKVPA+om9Gbagimo1rrAEcQVHZki6yz38 cD4A== MIME-Version: 1.0 X-Received: by 10.229.220.197 with SMTP id hz5mr15789410qcb.9.1400161612058; Thu, 15 May 2014 06:46:52 -0700 (PDT) Received: by 10.229.46.194 with HTTP; Thu, 15 May 2014 06:46:51 -0700 (PDT) In-Reply-To: References: Date: Thu, 15 May 2014 09:46:51 -0400 Message-ID: Subject: Re: RPC Client OutOfMemoryError Java Heap Space From: Geovanie Marquez To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11346f6a83e88b04f970873a X-Virus-Checked: Checked by ClamAV on apache.org --001a11346f6a83e88b04f970873a Content-Type: text/plain; charset=UTF-8 Thanks for the suggestion - I'll try to get that out this weekend sometime. On Tue, May 13, 2014 at 3:55 PM, Stack wrote: > A patch for the refguide would be great, perhaps in the troubleshooting > mapreduce section here http://hbase.apache.org/book.html#trouble.mapreduce > ? > St.Ack > > > On Tue, May 13, 2014 at 7:07 AM, Geovanie Marquez < > geovanie.marquez@gmail.com> wrote: > > > The following property does exactly what I wanted our environment to do. > I > > had a 4GiB Heap and ran the job and no jobs failed. Then I dropped our > > cluster heap to 1GiB and reran the same resource intensive task. > > > > This property must be added to the "HBase Service Advanced Configuration > > Snippet (Safety Valve) for hbase-site.xml" > > > > > > hbase.client.scanner.max.result.size > > 67108864 > > > > > > We noted that 64MiB would be enough, but we also experimented 128MiB. I > may > > do a write-up and elaborate some more on this. > > > > > > On Mon, May 12, 2014 at 1:38 PM, Vladimir Rodionov > > wrote: > > > > > All your OOME are on the client side (map task). Your map tasks need > more > > > heap. > > > Reduce # of map tasks and increase max heap size per map task. > > > > > > Best regards, > > > Vladimir Rodionov > > > Principal Platform Engineer > > > Carrier IQ, www.carrieriq.com > > > e-mail: vrodionov@carrieriq.com > > > > > > ________________________________________ > > > From: Geovanie Marquez [geovanie.marquez@gmail.com] > > > Sent: Thursday, May 08, 2014 2:35 PM > > > To: user@hbase.apache.org > > > Subject: Re: RPC Client OutOfMemoryError Java Heap Space > > > > > > sorry didn't include version > > > > > > CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47 > > > > > > > > > On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez < > > > geovanie.marquez@gmail.com > > > > wrote: > > > > > > > Hey group, > > > > > > > > There is one job that scans HBase contents and is really resource > > > > intensive using all resources available to yarn (under Resource > > Manager). > > > > In my case, that is 8GB. My expectation here is that a properly > > > configured > > > > cluster would kill the application or degrade the application > > performance > > > > but never ever take a region server down. This is intended to be a > > > > multi-tenant environment where developers may submit jobs at will > and I > > > > would want a configuration where the cluster services are not exited > in > > > > this way because of memory. > > > > > > > > The simple solution here, is to change the way the job consumes > > resources > > > > so that when run it is not so resource greedy. I want to understand > > how I > > > > can mitigate this situation in general. > > > > > > > > **It FAILS with the following config:** > > > > The RPC client has 30 handlers > > > > write buffer of 2MiB > > > > The RegionServer heap is 4GiB > > > > Max Size of all memstores is 0.40 of total heap > > > > HFile Block Cache Size is 0.40 > > > > Low watermark for memstore flush is 0.38 > > > > HBase Memstore size is 128MiB > > > > > > > > **Job still FAILS with the following config:** > > > > Everything else the same except > > > > The RPC client has 10 handlers > > > > > > > > **Job still FAILS with the following config:** > > > > Everything else the same except > > > > HFile Block Cache Size is 0.10 > > > > > > > > > > > > When this runs I get the following error stacktrace: > > > > # > > > > #How do I avoid this via configuration. > > > > # > > > > > > > > java.lang.OutOfMemoryError: Java heap space > > > > at > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100) > > > > at > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) > > > > 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to > > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase] > > > org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to > > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected > > > exception receiving call responses > > > > # > > > > > > > > ###Yes, there was an RPC timeout this is what is killing the server > > > because the timeout is eventually (1minute later) reached. > > > > > > > > # > > > > > > > > java.lang.OutOfMemoryError: Java heap space > > > > at > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100) > > > > at > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) > > > > 2014-05-08 16:23:55,319 INFO [main] > > > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from > > > org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of > > > OutOfOrderScannerNextException: was there a rpc timeout? > > > > at > > > > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384) > > > > at > > > > > > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194) > > > > at > > > > > > org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) > > > > at > > > > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533) > > > > at > > > > > > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > > > > at > > > > > > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > > at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > > > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > at javax.security.auth.Subject.doAs(Subject.java:415) > > > > at > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > > > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > > > > > > > # > > > > > > > > ## Probably caused by the OOME above > > > > > > > > # > > > > > > > > Caused by: > > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: > > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: > > Expected > > > nextCallSeq: 1 But the nextCallSeq got from client: 0; > > request=scanner_id: > > > 5612205039322936440 number_of_rows: 10000 close_scanner: false > > > next_call_seq: 0 > > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018) > > > > at > > > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929) > > > > at > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > > > > > > > > > Confidentiality Notice: The information contained in this message, > > > including any attachments hereto, may be confidential and is intended > to > > be > > > read only by the individual or entity to whom this message is > addressed. > > If > > > the reader of this message is not the intended recipient or an agent or > > > designee of the intended recipient, please note that any review, use, > > > disclosure or distribution of this message or its attachments, in any > > form, > > > is strictly prohibited. If you have received this message in error, > > please > > > immediately notify the sender and/or Notifications@carrieriq.com and > > > delete or destroy any copy of this message and its attachments. > > > > > > --001a11346f6a83e88b04f970873a--