hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Antczak <tantc...@operasolutions.com>
Subject RE: Hbase keeps dying (Zookeeper)
Date Fri, 09 Aug 2013 15:06:05 GMT
So I've done some more research into this and it appears that my Zookeeper doesn't have /hbase/master.
 From zkCli:

[zk: localhost:2181(CONNECTED) 3] ls /hbase
[splitlog, unassigned, root-region-server, rs, table, shutdown]
[zk: localhost:2181(CONNECTED) 4] get /hbase/master
Node does not exist: /hbase/master
[zk: localhost:2181(CONNECTED) 5]

I have no idea how this could have happened, but is there a way to regenerate the node in
zookeeper?  All of the other expected nodes are there.  It seems from the logs that everything
was fine with hbase until 12:01 AM on August 1st, at which point it just stopped working.
I can't find any reason that any of this has happened either.  It's all very strange.


-----Original Message-----
From: Trevor Antczak [mailto:tantczak@operasolutions.com] 
Sent: Monday, August 05, 2013 2:40 PM
To: user@hbase.apache.org
Subject: RE: Hbase keeps dying (Zookeeper)


Yes, hbase is managing the Quorum.

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Monday, August 05, 2013 12:39 PM
To: user@hbase.apache.org
Subject: Re: Hbase keeps dying (Zookeeper)

bq. there wasn't a copy of hdfs-site.xml

Can you tell us the versions of:
you're using ?

Did you let HBase manage your zookeeper quorum ?

On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak

> Hi all,
> I have an hbase system that has worked fine for quite a long time, but 
> now it is quite suddenly developing errors.  First it was dying 
> immediately on startup because there wasn't a copy of hdfs-site.xml in 
> the hbase conf directory (which doesn't seem like it should be 
> necessary, and I'm not sure how it got moved if it had been there in 
> the first place).  I copied the hdfs-site-xml from /etc/hadoops/conf 
> into /etc/hbase/conf.  Now hbase starts up, but it can never connect 
> to Zookeeper and dies after a few minutes of trying.  The weird thing, 
> is that according to Zookeeper the connection is happening.  From the hbase logs I get
a ton of messages like:
> 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node 
> for
> 0f3ca79375768472af70765ff231ee32 with OFFLINE state
> 2013-08-05 11:57:19,020 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=hmaster:60000,
> region=0f3ca79375768472af70765ff231ee32
> Eventually followed by:
> 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x4403f9ef5b20026 for server hslave14/, unexpected 
> error, closing socket connection and attempting reconnect
> java.io.IOException: Packet len4935980 is out of range!
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)
> And then a bunch more Java errors as the process dies.  From the 
> Zookeeper logs I see the hbase server connect:
> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket 
> connection from /xxx.xxx.xxx.xxx:34879
> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to 
> establish new session at /xxx.xxx.xxx.xxx:34879
> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 
> 0x1404ee40a8d000c with negotiated timeout 40000 for client
> /xxx.xxx.xxx.xxx:34879
> Then disconnect, but only after it shuts down:
> 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection 
> for client /xxx.xxx.xxx.xxx:34879 which had sessionid 
> 0x1404ee40a8d000c
> Does anyone have any clever ideas of places I can look for this error?  
> Or why I'm suddenly having this problem when I haven't changed anything?
>  Thanks in advance for any help provided.
> Trevor

View raw message