Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 93564 invoked from network); 10 Jul 2010 10:17:45 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Jul 2010 10:17:45 -0000 Received: (qmail 62278 invoked by uid 500); 10 Jul 2010 10:17:44 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 61872 invoked by uid 500); 10 Jul 2010 10:17:42 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 61862 invoked by uid 99); 10 Jul 2010 10:17:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jul 2010 10:17:40 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jamie.cockrill@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jul 2010 10:17:33 +0000 Received: by iwn2 with SMTP id 2so3608519iwn.14 for ; Sat, 10 Jul 2010 03:16:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=MxPslDzXlFNLVSeWtf+1gqIZfQB6GEwZpDJDnHXqKDc=; b=akvgxxeyqVuoPIJ0/Z2m+sgFbdZSxOn2B15k0VBtltDiRXKwOSAH35RAapQb6w4dAK lLXD5/V76XIo5pLz2kqxv1ALxw16RV9MZTLYO0IS5GMWRDqWKbzZEx5wl+n6n67yz8sq oEyIHwnFuNRpb3CcRw96z+CgYfWdY34sljeFs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Q33+vzEkrvx31Tvmlv11tdYL/7HdQgM+pkAP4rFA+mxTlThnext5WzqObiG7TULPyk 3QO2SBvtWPOQFkkRiPbi5NSofYQd4W5LXz3AxVpQDNy/k2BMyXDBdkoBwXMmGxMpsbsh Cy3xR+2XY/p6Et2JfOTbfKiZvwVarFRmNNogE= Received: by 10.231.145.16 with SMTP id b16mr8331603ibv.198.1278756972187; Sat, 10 Jul 2010 03:16:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.144.208 with HTTP; Sat, 10 Jul 2010 03:15:51 -0700 (PDT) In-Reply-To: References: From: Jamie Cockrill Date: Sat, 10 Jul 2010 11:15:51 +0100 Message-ID: Subject: Re: hbase regions reporting multiple times To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Arun, I had a very similar issue with my cluster when the regionserver with the .META. table on it crashed. It crippled the cluster for a while, but after shutting various things down and restarting them again, it seemed to work itself out eventually. I had to do this a few times and unfortunately I didn't keep a record of the order in which I shut things down and restarted them. The problem seemed to stem from the master thinking that META was stored on a node and that node having no knowledge of ever having held it. I tried a few major_compact of META, hoping that would fix it, but each failed with the same exception as below. The weird thing was that I could see (through the web UI on master) that META was now being held on a different regionserver. I wouldn't necessarily follow my lead in randomly shutting things down and hoping for the best as it may well have been something entirely different that fixed the issue in the end. If all else fails, try restarting the master and the regionservers a few times and see if that works out the kinks. thanks Jamie On 10 July 2010 04:48, Ryan Rawson wrote: > Others will have to chime in for details, but typically this means you > are having DNS issues. =A0That is the hostname is resolving to an ip and > not resolving back to the same name or vice versa or any other combo > of non-roundtripping involving ip and dns names. > > -ryan > > On Fri, Jul 9, 2010 at 6:41 PM, Arun Ramakrishnan > wrote: >> I shutdown hbase. Added some new nodes to hdfs, rebalanced. Also added t= hose nodes to hbase regionservers. >> Then started hbase. >> >> I am having this strange problem where the new nodes let's say host1 thr= u host4 gets repeatedly reported/added to the regionservers list. >> >> Initially when I did a "report 'simple'" from the shell, it showed me 10= unique hosts. Then within a matter of minutes it grew to 17 ( with the new= ly added hosts repeating multiple times). >> >> Also, the web UI failed with the following error. >> >> ############## >> HTTP ERROR: 500 >> Trying to contact region server 192.168.130.63:60020 for region .META.,,= 1, row '', but failed after 3 attempts. >> Exceptions: >> org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hba= se.NotServingRegionException: .META.,,1 >> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.regionserver.HRegionServer.get= Region(HRegionServer.java:2266) >> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.regionserver.HRegionServer.ope= nScanner(HRegionServer.java:1845) >> =A0 =A0 =A0 =A0at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown S= ource) >> =A0 =A0 =A0 =A0at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delega= tingMethodAccessorImpl.java:25) >> =A0 =A0 =A0 =A0at java.lang.reflect.Method.invoke(Method.java:597) >> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBase= RPC.java:657) >> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HB= aseServer.java:915) >> ############### >> >> >> Any insight into why the regions get repeated multiple times. I did a = =A0hadoop fsck / and it reports that all the blocks have been replicated 3 = times ( the configured value ). >> >> >> Thanks >> Arun >> >