hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Antczak <tantc...@operasolutions.com>
Subject Hbase keeps dying (Zookeeper)
Date Mon, 05 Aug 2013 16:15:23 GMT
Hi all,

I have an hbase system that has worked fine for quite a long time, but now it is quite suddenly
developing errors.  First it was dying immediately on startup because there wasn't a copy
of hdfs-site.xml in the hbase conf directory (which doesn't seem like it should be necessary,
and I'm not sure how it got moved if it had been there in the first place).  I copied the
hdfs-site-xml from /etc/hadoops/conf into /etc/hbase/conf.  Now hbase starts up, but it can
never connect to Zookeeper and dies after a few minutes of trying.  The weird thing, is that
according to Zookeeper the connection is happening.  From the hbase logs I get a ton of messages
like:

2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4403f9ef5b20026
Creating (or updating) unassigned node for 0f3ca79375768472af70765ff231ee32 with OFFLINE state
2013-08-05 11:57:19,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE,
server=hmaster:60000, region=0f3ca79375768472af70765ff231ee32

Eventually followed by:

2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session 0x4403f9ef5b20026 for
server hslave14/172.20.7.124:2181, unexpected error, closing socket connection and attempting
reconnect
java.io.IOException: Packet len4935980 is out of range!
        at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)

And then a bunch more Java errors as the process dies.  From the Zookeeper logs I see the
hbase server connect:

13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket connection from /xxx.xxx.xxx.xxx:34879
13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to establish new session at
/xxx.xxx.xxx.xxx:34879
13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 0x1404ee40a8d000c with negotiated
timeout 40000 for client /xxx.xxx.xxx.xxx:34879

Then disconnect, but only after it shuts down:

13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection for client /xxx.xxx.xxx.xxx:34879
which had sessionid 0x1404ee40a8d000c

Does anyone have any clever ideas of places I can look for this error?  Or why I'm suddenly
having this problem when I haven't changed anything?  Thanks in advance for any help provided.

Trevor

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message