Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA21111535 for ; Thu, 14 Aug 2014 22:31:22 +0000 (UTC) Received: (qmail 51291 invoked by uid 500); 14 Aug 2014 22:31:21 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 51232 invoked by uid 500); 14 Aug 2014 22:31:20 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 51206 invoked by uid 99); 14 Aug 2014 22:31:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 22:31:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of esteban@cloudera.com designates 74.125.82.178 as permitted sender) Received: from [74.125.82.178] (HELO mail-we0-f178.google.com) (74.125.82.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 22:31:15 +0000 Received: by mail-we0-f178.google.com with SMTP id w61so1669034wes.23 for ; Thu, 14 Aug 2014 15:30:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=wy92ccejbnwI8ZCuUEpT8FFaux6sTa7ILd7TyReo8do=; b=Jsp22kgaBK1PZEQqrypwjQvDyk18vsP0XGgUVkXSkFvQec+qzkrFnjJn8qAwgaGLvX 5/+EGfoobWvEMw4rDVV57CkaNA4VSdCHXwy4ZAfsBxkhWJYVFTl/UU4ncJ1MCwC80Heg 4TlRLLUKch2LoQItSXmxzntHOnsNSsOf+kAda5lp1fb0RBNdi759gnHZLygWsz5IVwrT IeInkLNP/FM9UQLzTTCr2PWu4YNFZhSeIHJwBQa0C968SyBgd+aGbGTE06O0riagZRQg Ez7MDYefNx7i7i9WYaOWhMiY/gy0kcTSkBXAWXQr5iJ7svYM63uG0804frjLt6lfTb2H ofCw== X-Gm-Message-State: ALoCoQkS0GsBEjKXnVanfVdYdtrBo39nO2uLhOZJqEYDVYlTOZxaTlpM4gt7vH+LiXzmSfroTglJ X-Received: by 10.195.18.8 with SMTP id gi8mr15590104wjd.75.1408055454281; Thu, 14 Aug 2014 15:30:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.118.134 with HTTP; Thu, 14 Aug 2014 15:30:39 -0700 (PDT) In-Reply-To: References: <1407976577.45002.YahooMailNeo@web140605.mail.bf1.yahoo.com> <8E771DA6-5500-4B7E-9061-E317870D2AEE@mentacapital.com> From: Esteban Gutierrez Date: Thu, 14 Aug 2014 15:30:39 -0700 Message-ID: Subject: Re: HBase client hangs after client-side OOM To: "user@hbase.apache.org" Cc: "dev@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=001a1130cc462d3ff505009e75a6 X-Virus-Checked: Checked by ClamAV on apache.org --001a1130cc462d3ff505009e75a6 Content-Type: text/plain; charset=UTF-8 Hello Ted, ZooKeeper 3.4.5 is the recommended release to use in HBase 0.94.x, regarding compatibility across ZooKeeper releases I don't think there is any issue, but the ZK devs might be able to confirm. cheers, esteban. -- Cloudera, Inc. On Thu, Aug 14, 2014 at 3:19 PM, Ted Tuttle wrote: > Hello All- > > It sounds like upgrading our zookeeper client would be a good idea. Can > anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK > 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4? I've > read a few contradictory things about ZK client/server compatibility across > 3.3/3.4 releases. > > Thanks, > Ted > > -----Original Message----- > From: Ted Tuttle [mailto:ted@mentacapital.com] > Sent: Thursday, August 14, 2014 12:43 PM > To: user@hbase.apache.org > Cc: dev@zookeeper.apache.org > Subject: RE: HBase client hangs after client-side OOM > > Hello Esteban- > > At the time of the ZK connection problems the client had an OOM event. > However, the client machine overall was in fine shape looking at ganglia > reports; it certainly wasn't swapping or spending significant cycles on > I/O wait. > > Similarly, our zookeeper server was real chilled as it always is. > > Regarding client configuration: > > > > hbase.client.pause > 1000 > > > Thanks, > Ted > > -----Original Message----- > From: Esteban Gutierrez [mailto:esteban@cloudera.com] > Sent: Thursday, August 14, 2014 10:47 AM > To: user@hbase.apache.org > Cc: dev@zookeeper.apache.org > Subject: Re: HBase client hangs after client-side OOM > > Hi Ted, > > I've see this kind of client "hangs" few times when the underlying > environment is under heavy swapping and with older versions of ZK as Rakesh > mentioned, also when hbase.client.pause is set to 0. Do you know if your > environment is experiencing a similar behavior with heavy IO due swapping ? > can you also share your client configuration too? > > cheers, > esteban. > > -- > Cloudera, Inc. > > > > On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle wrote: > > > The client-side thread dump in here: > > > > http://pastebin.com/xU4MSq9k > > > > SendThread appears to be active. > > > > -----Original Message----- > > From: Rakesh R [mailto:rakeshr@huawei.com] > > Sent: Thursday, August 14, 2014 7:01 AM > > To: dev@zookeeper.apache.org; user@hbase.apache.org > > Subject: RE: HBase client hangs after client-side OOM > > > > Hi, > > > > >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16. > > > > ZK version is quite old. I could see ClientCnxn is only catching > > IOException and when there is OOME it will exit SendThread. > > I think, thats the reason for client hanging. Client side threaddump > > will help us to see the liveliness of SendThread. > > > > Client side exception handling has been modified in 3.4 & 3.5 branches. > > Can you check the possibility of upgrading to 3.4.6 latest release. > > > > Regards, > > Rakesh > > > > -----Original Message----- > > From: Qiang Tian [mailto:tianq01@gmail.com] > > Sent: 14 August 2014 11:03 > > To: user@hbase.apache.org; dev@zookeeper.apache.org > > Subject: Re: HBase client hangs after client-side OOM > > > > the sendthread stacktrace looks not correct. Do you have the client log? > > (in case zk client code log sth there) from the zk code, it looks > > ClientCnxn$SendThread.run should have caught > > it(throwable) and done the cleanup work, e.g. notify the main thread, > > so that it can wake up from ClientCnxn.submitRequest.. > > > > send to Zookeeper for help. > > thanks. > > > > > > > > On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle > wrote: > > > > > Hi Lars- > > > > > > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16. > > > > > > Thanks, > > > Ted > > > > > > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" > wrote: > > > > > > > > Hey Ted, > > > > > > > > so this is a problem with the ZK client, it seems to not clean > > > > itself up > > > correctly upon receiving an exception at the wrong moment. > > > > Which version of ZK are you using? > > > > > > > > > > > > -- Lars > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: Ted Tuttle > > > > To: "user@hbase.apache.org" > > > > Cc: Development > > > > Sent: Wednesday, August 13, 2014 4:38 PM > > > > Subject: HBase client hangs after client-side OOM > > > > > > > > Hello- > > > > > > > > We are running HBase v0.94.16 on an 8 node cluster. > > > > > > > > We have a recurring problem w/ HBase clients hanging. In latest > > > occurrence, I observed the following sequence of events: > > > > > > > > 0) client plays w/ HBase for a long time w/o issue > > > > 1) client runs out of memory during HBase operation: > > > > > > > > http://pastebin.com/b5x44Lx7 > > > > > > > > 3) Exception is thrown, memory is released > > > > 2) In some shutdown logic the client tries to access HBase again > > > > and > > > hangs: > > > > > > > > http://pastebin.com/xU4MSq9k > > > > > > > > Clearly I need to fix OOM. However, the fact that client hangs is > > > > not > > > nice. Any ideas why? > > > > > > > > BTW- I started by looking at zookeeper log. Not much there but > > > > here you > > > go: > > > > > > > > http://pastebin.com/wZvE0Fbv > > > > > > > > Thanks, > > > > Ted > > > > > > > > > > --001a1130cc462d3ff505009e75a6--