Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D14AF648F for ; Tue, 24 May 2011 23:39:29 +0000 (UTC) Received: (qmail 45467 invoked by uid 500); 24 May 2011 23:39:28 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 45434 invoked by uid 500); 24 May 2011 23:39:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 45425 invoked by uid 99); 24 May 2011 23:39:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 23:39:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of magnito@gmail.com designates 209.85.213.169 as permitted sender) Received: from [209.85.213.169] (HELO mail-yx0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 23:39:22 +0000 Received: by yxt33 with SMTP id 33so3652154yxt.14 for ; Tue, 24 May 2011 16:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=pBz+gUzOXrVYcz2vt9/88dKOD4DxFtMCLSZkMKtgyys=; b=tLkRVq142k4NUkK7YvEZfwroJ1QJD0MQu6CXL3uugB6Np6JSywYLyuZ0WBqSr4J9tM lUQP0ftWCZJIvk7VhIcRkz9uPVRZQOPb7A2v40kzS6ppIkDySOLq86C5A2cok0xlKvHK XyjbdvPWgxt9EFKczCiq/2F5eGEAGSlhiodk8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=E0jVA65tKy9DwlPdIzhwkYSl3f3bkDi2NGszHotYoA8uezMdMWKGe05Wnh+NpvaR+d JxSqMzpS2j/R0GXwamhnRRvjYVUqWggiUgAX8BNcfINgtHP1mPnFrWpNZm/wGsDpN2nO d7TyWctedql4KmpWrxTL4c4wIyPVGwOe+nuxY= MIME-Version: 1.0 Received: by 10.236.122.130 with SMTP id t2mr5675793yhh.367.1306280224523; Tue, 24 May 2011 16:37:04 -0700 (PDT) Received: by 10.236.110.173 with HTTP; Tue, 24 May 2011 16:37:04 -0700 (PDT) In-Reply-To: References: Date: Tue, 24 May 2011 16:37:04 -0700 Message-ID: Subject: Re: 0.90.3 From: Jack Levin To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org figured it out... the /etc/hosts file has ip to name, was used by zookeeper was *.prod.imageshack.com, while hostname was imgXX.imageshack.us... use by Regionserver/Master - Ideally, all three components should source hostnames form same place, whether its hostname or /etc/hosts (or dns), etc... it gotta be consistent, otherwise aliases end up screwing things up and people will end up guessing why things don't work. -Jack On Tue, May 24, 2011 at 4:04 PM, Jack Levin wrote: > img645.prod.imageshack.us and img645.imageshack.us are both point to > the same IP. > > -Jack > > On Tue, May 24, 2011 at 3:50 PM, Jack Levin wrote: >> looks like our balancer is on: >> >> hbase(main):001:0> balance_switch true >> true >> 0 row(s) in 0.3700 seconds >> >> I simply kill PID for RS, and it stays on the list with regions >> assigned, and master does not know about it. >> >> So it still does not work. >> >> -Jack >> >> On Tue, May 24, 2011 at 3:43 PM, Dave Latham wrote= : >>> Are you using the graceful_stop script? >>> >>> In 0.90.3 the bin/graceful_stop.sh script was updated to disable the >>> master's balancer. =A0However, it doesn't seem that anything re-enables= it, so >>> if you're using it you need to re-enable it on your own. =A0See the boo= k for >>> more details: >>> http://hbase.apache.org/book.html#decommission >>> >>> Dave >>> >>> On Tue, May 24, 2011 at 3:33 PM, Jack Levin wrote: >>> >>>> just put new hbase version on our test cluster. and been testing it... >>>> so far if I shutdown an RS, master does not reassign its regions, and >>>> we remain inconsistent forerver, likewise when new RS is up, it does >>>> not get regions assigned to it, this is the master log: >>>> >>>> >>>> 2011-05-24 15:30:57,724 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>> Event, type=3DNodeDeleted, state=3DSyncConnected, >>>> path=3D/hbase/rs/img645.prod.imageshack.com,60020,1306276075768 >>>> 2011-05-24 15:30:57,724 INFO >>>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer >>>> ephemeral node deleted, processing expiration >>>> [img645.prod.imageshack.com,60020,1306276075768] >>>> 2011-05-24 15:30:57,724 INFO >>>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo >>>> found for img645.prod.imageshack.com,60020,1306276075768 >>>> 2011-05-24 15:30:57,726 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>> Event, type=3DNodeChildrenChanged, state=3DSyncConnected, path=3D/hbas= e/rs >>>> 2011-05-24 15:31:03,330 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>> Event, type=3DNodeChildrenChanged, state=3DSyncConnected, path=3D/hbas= e/rs >>>> 2011-05-24 15:31:03,338 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> master:60000-0x1302094818900a4-0x1302094818900a4 Retrieved 32 byte(s) >>>> of data from znode >>>> /hbase/rs/img645.prod.imageshack.com,60020,1306276262774 and set >>>> watcher; img645.prod.imageshack.com:60020 >>>> 2011-05-24 15:31:03,350 INFO >>>> org.apache.hadoop.hbase.master.ServerManager: Server start rejected; >>>> we already have img645.imageshack.us:60020 registered; >>>> existingServer=3DserverName=3Dimg645.imageshack.us,60020,1306276075768= , >>>> load=3D(requests=3D0, regions=3D0, usedHeap=3D40, maxHeap=3D3995), >>>> newServer=3DserverName=3Dimg645.imageshack.us,60020,1306276262774, >>>> load=3D(requests=3D0, regions=3D0, usedHeap=3D23, maxHeap=3D3995) >>>> 2011-05-24 15:31:03,350 INFO >>>> org.apache.hadoop.hbase.master.ServerManager: Triggering server >>>> recovery; existingServer img645.imageshack.us,60020,1306276075768 >>>> looks stale >>>> 2011-05-24 15:31:03,353 DEBUG >>>> org.apache.hadoop.hbase.master.ServerManager: >>>> Added=3Dimg645.imageshack.us,60020,1306276075768 to dead servers, >>>> submitted shutdown handler to be executed, root=3Dfalse, meta=3Dfalse >>>> 2011-05-24 15:31:03,353 INFO >>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>>> Splitting logs for img645.imageshack.us,60020,1306276075768 >>>> 2011-05-24 15:31:04,348 INFO >>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>>> Reassigning 0 region(s) that img645.imageshack.us,60020,1306276075768 >>>> was carrying (skipping 0 regions(s) that are already in transition) >>>> 2011-05-24 15:31:04,348 INFO >>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished >>>> processing of shutdown of img645.imageshack.us,60020,1306276075768 >>>> 2011-05-24 15:31:06,333 DEBUG >>>> org.apache.hadoop.hbase.master.ServerManager: Server >>>> img645.imageshack.us,60020,1306276262774 came back up, removed it from >>>> the dead servers list >>>> 2011-05-24 15:31:06,333 INFO >>>> org.apache.hadoop.hbase.master.ServerManager: Registering >>>> server=3Dimg645.imageshack.us,60020,1306276262774, regionCount=3D0, >>>> userLoad=3Dfalse >>>> 2011-05-24 15:31:49,890 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection opening >>>> connection to ZooKeeper with ensemble (img648:2181) >>>> 2011-05-24 15:31:49,890 INFO org.apache.zookeeper.ZooKeeper: >>>> Initiating client connection, connectString=3Dimg648:2181 >>>> sessionTimeout=3D180000 watcher=3Dhconnection >>>> 2011-05-24 15:31:49,891 INFO org.apache.zookeeper.ClientCnxn: Opening >>>> socket connection to server img648/38.99.76.205:2181 >>>> 2011-05-24 15:31:49,892 INFO org.apache.zookeeper.ClientCnxn: Socket >>>> connection established to img648/38.99.76.205:2181, initiating session >>>> 2011-05-24 15:31:49,893 INFO org.apache.zookeeper.ClientCnxn: Session >>>> establishment complete on server img648/38.99.76.205:2181, sessionid = =3D >>>> 0x13024216e690004, negotiated timeout =3D 180000 >>>> 2011-05-24 15:31:49,894 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection >>>> Received ZooKeeper Event, type=3DNone, state=3DSyncConnected, path=3Dn= ull >>>> 2011-05-24 15:31:49,895 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>> hconnection-0x13024216e690004 connected >>>> 2011-05-24 15:31:49,896 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> hconnection-0x13024216e690004 Set watcher on existing znode >>>> /hbase/master >>>> 2011-05-24 15:31:49,896 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> hconnection-0x13024216e690004 Retrieved 32 byte(s) of data from znode >>>> /hbase/master and set watcher; img648.prod.imageshack.com:60000 >>>> 2011-05-24 15:31:49,897 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> hconnection-0x13024216e690004 Set watcher on existing znode >>>> /hbase/root-region-server >>>> 2011-05-24 15:31:49,897 DEBUG >>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> hconnection-0x13024216e690004 Retrieved 26 byte(s) of data from znode >>>> /hbase/root-region-server and set watcher; img731.imageshack.us:60020 >>>> 2011-05-24 15:31:49,900 DEBUG >>>> org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting >>>> at row=3D for max=3D2147483647 rows >>>> 2011-05-24 15:31:49,900 DEBUG >>>> >>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplement= ation: >>>> Lookedup root region location, >>>> >>>> connection=3Dorg.apache.hadoop.hbase.client.HConnectionManager$HConnec= tionImplementation@26f50154 >>>> ; >>>> hsa=3Dimg731.imageshack.us:60020 >>>> 2011-05-24 15:31:49,913 DEBUG >>>> >>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplement= ation: >>>> Cached location for .META.,,1.1028785192 is img654.imageshack.us:60020 >>>> 2011-05-24 15:31:50,061 INFO >>>> >>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplement= ation: >>>> Closed zookeeper sessionid=3D0x13024216e690004 >>>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ZooKeeper: Session: >>>> 0x13024216e690004 closed >>>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ClientCnxn: >>>> EventThread shut down >>>> >>>> Please help :) >>>> >>>> -Jack >>>> >>> >> >