Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6CB04F6BC for ; Tue, 30 Apr 2013 03:48:57 +0000 (UTC) Received: (qmail 99123 invoked by uid 500); 30 Apr 2013 03:48:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98531 invoked by uid 500); 30 Apr 2013 03:48:52 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 98479 invoked by uid 99); 30 Apr 2013 03:48:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 03:48:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of viral.bajaria@gmail.com designates 209.85.219.46 as permitted sender) Received: from [209.85.219.46] (HELO mail-oa0-f46.google.com) (209.85.219.46) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 03:48:45 +0000 Received: by mail-oa0-f46.google.com with SMTP id k3so80605oag.5 for ; Mon, 29 Apr 2013 20:48:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=wbMsymJyiecZBCHcHFOyBQPW34UeDOE91i1I2MRy8yY=; b=KRdZAQamiCeh3DQS+E5x4AAJsKxtpxAkvSGZ6WgGxvmiPT+W/jBbqhJYJ8+h6XIuVN 5C7e14n3cgKS4gMOv6EJkD0nTft5rG4F2xmwUpmigorrKthCGbMdjfmBo+zMrVBzWEPr AqgJDCnuNQ8I5Hz0L/F8FFSmZuHay3F67QM9//2Zxx5+Ke6FolVGDXXzbd0zNZSJPZ7n 406V6DwpByLjJ7RISG3mX6pVeJ/NVJtbjgMtzdYweLGKo0PwD86hopNbl9IUCDLXKq7g 5IX3Ruz2ysOb2oXIQMJ78WXniDHLUbH+bX+lUQO3bBls5FOpV0zHiXWod815WPZleoNm BXng== MIME-Version: 1.0 X-Received: by 10.60.133.13 with SMTP id oy13mr15792823oeb.28.1367293705082; Mon, 29 Apr 2013 20:48:25 -0700 (PDT) Received: by 10.182.76.74 with HTTP; Mon, 29 Apr 2013 20:48:24 -0700 (PDT) In-Reply-To: References: <71A49179-C671-4290-AAB0-979AB7FCC2FA@gmail.com> <026442B4-7AED-4110-BC67-668B51C2BCF5@gmail.com> Date: Mon, 29 Apr 2013 20:48:24 -0700 Message-ID: Subject: Re: max regionserver handler count From: Viral Bajaria To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7b471dfe99824a04db8bdff4 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b471dfe99824a04db8bdff4 Content-Type: text/plain; charset=ISO-8859-1 Still unclear how and where the same session id is being re-used. Any thoughts ? The ROOT region was just bouncing around state's since the RS would be marked as dead whenever these timeouts occur and thus the ROOT region will be moved to a different server. Once the ROOT gets assigned to a RS which had less handlers (< 15K), it stopped bouncing around. I am surprised bumping up handlers and having 0 traffic on the cluster can cause this issue. -Viral On Mon, Apr 29, 2013 at 1:23 PM, Viral Bajaria wrote: > On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu wrote: > >> After each zookeeper reconnect, I saw same session Id (0x703e...) >> >> What version of zookeeper are you using ? Can you search zookeeper log >> for this session Id to see what happened ? >> >> Thanks >> > > Zookeeper version is 3.4.5, following are the logs from 2 zookeeper > servers. The first one was the first time ever the regionserver connected > to ZK and after that all of them are retries. I doubt the issue is on the > ZK side since I have 3 other services running which run fine with the same > quorum and have no issues. > > I feel I am hitting some other bug with HBase when the number of handlers > is increased by a lot. Anyone seen a high number of handlers in any > production deployment out there ? > > Thanks, > Viral > > ===server with first session=== > 2013-04-29 07:40:55,574 [myid:1072011376] - INFO > [CommitProcessor:1072011376:ZooKeeperServer@595] - Established session > 0x703e48a8cfd81be6 with negotiated timeout 40000 for client / > 10.155.234.3:53814 > EndOfStreamException: Unable to read additional data from client sessionid > 0x703e48a8cfd81be6, likely client has closed socket > 2013-04-29 07:43:47,737 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:53814 which had sessionid 0x703e48a8cfd81be6 > 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew > session 0x703e48a8cfd81be6 at /10.155.234.3:53831 > 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: > 0x703e48a8cfd81be6 > 2013-04-29 07:46:29,680 [myid:1072011376] - INFO > [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - > Established session 0x703e48a8cfd81be6 with negotiated timeout 40000 for > client /10.155.234.3:53831 > EndOfStreamException: Unable to read additional data from client sessionid > 0x703e48a8cfd81be6, likely client has closed socket > 2013-04-29 07:49:10,401 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:53831 which had sessionid 0x703e48a8cfd81be6 > 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew > session 0x703e48a8cfd81be6 at /10.155.234.3:53854 > 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: > 0x703e48a8cfd81be6 > 2013-04-29 07:55:53,442 [myid:1072011376] - INFO > [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - > Established session 0x703e48a8cfd81be6 with negotiated timeout 40000 for > client /10.155.234.3:53854 > EndOfStreamException: Unable to read additional data from client sessionid > 0x703e48a8cfd81be6, likely client has closed socket > 2013-04-29 07:58:33,947 [myid:1072011376] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:53854 which had sessionid 0x703e48a8cfd81be6 > > ===server with subsequent sessions=== > 2013-04-29 07:44:28,733 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew > session 0x703e48a8cfd81be6 at /10.155.234.3:51353 > 2013-04-29 07:44:28,734 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session > 0x703e48a8cfd81be6 with negotiated timeout 40000 for client / > 10.155.234.3:51353 > EndOfStreamException: Unable to read additional data from client sessionid > 0x703e48a8cfd81be6, likely client has closed socket > 2013-04-29 07:46:29,206 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:51353 which had sessionid 0x703e48a8cfd81be6 > 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew > session 0x703e48a8cfd81be6 at /10.155.234.3:51376 > 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session > 0x703e48a8cfd81be6 with negotiated timeout 40000 for client / > 10.155.234.3:51376 > EndOfStreamException: Unable to read additional data from client sessionid > 0x703e48a8cfd81be6, likely client has closed socket > 2013-04-29 07:55:53,049 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:51376 which had sessionid 0x703e48a8cfd81be6 > 2013-04-29 08:02:36,000 [myid:54242244232] - INFO > [SessionTracker:ZooKeeperServer@325] - Expiring session > 0x703e48a8cfd81be6, timeout of 40000ms exceeded > 2013-04-29 08:02:36,001 [myid:54242244232] - INFO > [ProcessThread(sid:54242244232 cport:-1)::PrepRequestProcessor@476] - > Processed session termination for sessionid: 0x703e48a8cfd81be6 > 2013-04-29 08:02:36,283 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew > session 0x703e48a8cfd81be6 at /10.155.234.3:51398 > 2013-04-29 08:02:36,284 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@588] - Invalid session > 0x703e48a8cfd81be6 for client /10.155.234.3:51398, probably expired > 2013-04-29 08:02:36,284 [myid:54242244232] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for > client /10.155.234.3:51398 which had sessionid 0x703e48a8cfd81be6 > > --047d7b471dfe99824a04db8bdff4--