hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: cold restart/region servers issue
Date Fri, 22 Oct 2010 17:56:53 GMT
Hmm... does it emit that message once or continuously.  In log we emit
the ensemble we're trying to contact.  Does it look correct?  When the
machine is having this issue next time, try running the zk cmdline
client and see if you can see a znode at /hbase/master:

$ ./bin/hbase org.apache.zookeeper.ZooKeeperMain -server HOST:PORT

Where HOST:PORT are what the RS is reporting for zk ensemble.

Once you have the zk cmdline client up, do something like

ls /hbase


....


St.Ack

On Fri, Oct 22, 2010 at 10:42 AM, Jack Levin <magnito@gmail.com> wrote:
> Same ZK all the time, restart of regionserver clears the issue.  I
> even see them talking to ZK via tcpdump, is there a way to enable
> debug log output on ZK to see with might be going on?
>
> -Jack
>
> On Fri, Oct 22, 2010 at 10:28 AM, Stack <stack@duboce.net> wrote:
>> Are they pointed to the same zk ensemble as the other 22 servers? That
>> is, are they running with the same config?  The below complaint is
>> that the regionserver is not seeing master register, perhaps because
>> they are homed at the wrong location in zk or because they are going
>> to a different zk?
>> St.Ack
>>
>> On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin <magnito@gmail.com> wrote:
>>> I have 30 region servers, after cold restart (master, zookepeers, and
>>> all regionservers), 22 regionservers start, but the other 8 have
>>> following errors,
>>> any idea how to debug this?  Is zookeeper giving the RS wrong msg?
>>> Can I log it via tcpdump maybe?
>>>
>>> 2010-10-22 08:32:42,035 WARN
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read
>>> master address from ZooKeeper. Retrying. Error was:
>>> java.io.IOException:
>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>>> = NoNode for /hbase/master
>>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481)
>>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377)
>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289)
>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320)
>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>> KeeperErrorCode = NoNode for /hbase/master
>>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477)
>>>        ... 5 more
>>>
>>
>

Mime
View raw message