hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujee Maniyam <su...@sujee.net>
Subject Re: region servers not shutting down (v 0.90.1-cdh3u0, r)
Date Fri, 10 Jun 2011 23:05:22 GMT
Jean
DNS mismatch was the cause!  good-eyes!

Here is what I had to do:

1) changed hostnames to fully qualified ones, on all machines
file : /etc/sysconfig/network
before: devperf-sn6
now : devperf-sn6.pcs.hds.com


2) used fully qualified hostnames (FQHN) in   'hbase-site.xml'
before : devperf-sn6
now : devperf-sn6.pcs.hds.com

Then even after a restart, zookeeper was still  doing lookup on old
hostnames and erroring out

3) I had some shorthand alias in   /etc/hosts (on master node)
   ip_address1  hmaster
   ip_address2  rs1
I deleted these (and restarted machine  just to be sure)

4) delete zookeeper dir on ZK machines (this one was not very obvious!)
        rm -rf /tmp/hbase-hadoop

only then things started working!

I am happy to document this in wiki some place if it might help others.

A) Is there any other 'best practices' to keep DNS / HOST LOOKUPs straight?
    A2)  would it be safer if I used the IP addresses?  Or reverse DNS
required even then?

B) I do miss the short hand aliases in /etc/hosts.  Is there a way to have
these aliases, without interfering with Hbase / zookeeper?

thanks for your help!
http://sujee.net


On Fri, Jun 10, 2011 at 2:38 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> There's a DNS mismatch:
>
> devperf-sn10,60020,1307732557915
> devperf-sn10.pcs.hds.com,60020,1307732557915
>
> And 0.90 has a big regression with that (0.92 already has the fixes,
> but it's not released yet). Make sure your nodes all resolve the same
> hostnames per http://hbase.apache.org/book.html#dns
>
> BTW the clue comes from those kinda lines:
>
> 2011-06-10 12:03:50,975 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo
> found for devperf-sn10.pcs.hds.com,60020,1307732557915
>
> J-D
>
> On Fri, Jun 10, 2011 at 9:26 PM, Sujee Maniyam <sujee@sujee.net> wrote:
> > looks like this RS has the ROOT region.  The shutdown was initiated by a
> > kill <pid>  command by me.
> > any thing specific I should look for in logs / config?
> >
> > thanks
> > http://sujee.net
> >
> >
> > On Fri, Jun 10, 2011 at 2:09 PM, Stack <stack@duboce.net> wrote:
> >
> >> That looks like we're waiting on the shutdown of the -ROOT- region?
> >> Is that so.  Anything on why it won't go down earlier in the log?
> >> St.Ack
> >>
> >>
> >> On Fri, Jun 10, 2011 at 12:23 PM, Sujee Maniyam <sujee@sujee.net>
> wrote:
> >> > Hi all
> >> > I am running  Hbase on a 6 node cluster.   HBase comes up fine, I can
> >> create
> >> > a test table and put rows and scan.  But I can't cleanly shut it down.
> >>  the
> >> > stop-hbase command goes on for ever printing dots.  And I can see a
> >> couple
> >> > of RegionServers are not terminating.
> >> >
> >> > here are the details:
> >> >
> >> > 5 RS , 1 Master
> >> > 3 zookeepers
> >> >
> >> > hbase : 0.90.1-cdh3u0, r  (both hadoop & hbase are Cloudera cdh 3
> >> > distributions)
> >> > hadoop : 0.20.2-cdh3u0
> >> >
> >> > master-log : http://pastebin.com/tBvJDPHc
> >> > rserver log : http://pastebin.com/EsWYAuUk
> >> > hbase_site.xml : http://pastebin.com/sU7EM2QK
> >> >
> >> >
> >> > During the shutdown, I see this in the region server logs:
> >> >
> >> > 2011-06-10 12:03:55,940 DEBUG
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on
> 70236052
> >> > 2011-06-10 12:03:58,942 DEBUG
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on
> 70236052
> >> > ....
> >> >
> >> >
> >> > thanks very much for your help!
> >> > Sujee Maniyam
> >> > http://sujee.net
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message