Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3635410367 for ; Sat, 15 Feb 2014 01:00:44 +0000 (UTC) Received: (qmail 11410 invoked by uid 500); 15 Feb 2014 01:00:41 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 11297 invoked by uid 500); 15 Feb 2014 01:00:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 11288 invoked by uid 99); 15 Feb 2014 01:00:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 01:00:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rohitkelkar@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 01:00:34 +0000 Received: by mail-ie0-f177.google.com with SMTP id rp18so3007086iec.36 for ; Fri, 14 Feb 2014 17:00:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=z6oeMb0zGbtiFEXQ7GWQ3fIp/Gq893aqVuY41SeEUW8=; b=f6M3k3dnz7wNwGQnY0IV2fwURKJPHjTGMW6bpIwbC26m2pKPEh2kWtl0ZzF5MyL3Kq JRT+ChUjUZNKf8IRIbXCm9bZC6QwHJdQR9fYnUPG4VwuDpvk3eoF3/BhzGkjRhkGYrZz ASLsfZioR4AJEeMhdiZxgu2YYR8ztFdrNrF8Qcpe6hmZWHnt3r8dZdywXrVYtxV2NMAp tUjt5lvpoVRNzVyFO/6vJdLi2Jwpxe/nXGHT0AFKtvRG1+CWTkTvDcI9FECgOdx1D0gB PhVdk3PxTHgZrGdg5OfMZrD0dGfunEqk3l5QxlooWuWOwfdLyvb87H8jHeyu534v8n8L DnOg== MIME-Version: 1.0 X-Received: by 10.50.132.66 with SMTP id os2mr5558063igb.33.1392426013327; Fri, 14 Feb 2014 17:00:13 -0800 (PST) Received: by 10.43.82.193 with HTTP; Fri, 14 Feb 2014 17:00:13 -0800 (PST) In-Reply-To: References: Date: Fri, 14 Feb 2014 19:00:13 -0600 Message-ID: Subject: Re: uneven region distribution From: Rohit Kelkar To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7b2e3f3ce6878c04f2677199 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e3f3ce6878c04f2677199 Content-Type: text/plain; charset=ISO-8859-1 Thanks for your inputs, I am sharing the master log - http://pastebin.com/Xi9P6Ykr and the region server log of the failed region server - http://pastebin.com/1munghDv - R On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu wrote: > Looking at bug fix since 0.94.2, I wonder if you are experiencing the > following which went into 0.94.10 : > HBASE-8432 a table with unbalanced regions will balance indefinitely > > Master log would tell us more. > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar > wrote: > > > Sorry mis-stated the version, its 0.94.2 > > > > - R > > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu wrote: > > > > > bq. it does not change the status of the assignments. > > > > > > Can you check / pastebin master log to see what caused the balancing to > > > stop ? > > > > > > bq. attributing the region server crash to the disproportionately high > > > number of regions on that server? > > > > > > Checking region server log on server5 should give us more clue. > > > > > > bq. 0.92.4 > > > > > > please consider upgrading :-) > > > > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar > > > wrote: > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing > that a > > > > particular region server often crashes. A status 'simple' on hbase > > shell > > > > gives the following stats > > > > > > > > > > > > HBase Shell; enter 'help' for list of supported commands. > Type > > > > "exit" to leave the HBase Shell Version 0.94.2, r1395367, Sun > > > Oct 7 > > > > 19:11:01 UTC 2012 > > > > status 'simple' 4 live servers > > > > server7:60020 1392017875910 requestsPerSecond=0, > > > numberOfOnlineRegions=419, > > > > usedHeapMB=3315, maxHeapMB=6127 > > > > server4:60020 1392300859332 requestsPerSecond=843, > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127 > > > > server3:60020 1391583646998 requestsPerSecond=429, > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127 > > > > server6:60020 1391583647588 requestsPerSecond=0, > > > numberOfOnlineRegions=966, > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers > > > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417 > > > > > > > > The dead region server has 2417 regions as opposed to 419, 379, 653, > > 966 > > > > regions on other servers. Am I right in attributing the region server > > > crash > > > > to the disproportionately high number of regions on that server? > > > > > > > > If I invoke the balancer on hbase shell using the "balancer" command > it > > > > returns true. But it does not change the status of the assignments. > > > > > > > > - R > > > > > > > > > > --047d7b2e3f3ce6878c04f2677199--