hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase cluster crashed on-the-hour
Date Thu, 16 Jul 2015 22:05:09 GMT
How many servers are there in zookeeper quorum ?

Have you checked the log of zookeeper leader round the time master crashed ?

Cheers

On Wed, Jul 15, 2015 at 7:14 PM, Jo Young Zhang <joyoungzhang@gmail.com>
wrote:

> I found hbase clutser crashed on-the-hour
> HBase master running log as follows
>
> "2015-07-14 14:41:49,832 DEBUG
> [master:10.240.131.18:60000.oldLogCleaner]
> master.ReplicationLogCleaner:
> Didn't find this log in ZK, deleting:
> 10-241-125-46%2C60020%2C1436841063572.1436851865226
> 2015-07-14 14:45:49,822 DEBUG
> [master:10.240.131.18:60000.oldLogCleaner]
> master.ReplicationLogCleaner:
> Didn't find this log in ZK, deleting:
> 10-241-85-137%2C60020%2C1436841341086.1436852143141
> 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2
> 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Subversion
> https://svn.apache.org/repos/asf/hbase/tags/0.96.2RC2 -r 1581096
> 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Compiled by stack on
> Mon Mar 24 16:03:18 PDT 2014
> 2015-07-14 15:00:03,729 INFO [main] zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client
> environment:
> host.name=10-240-131-18
> 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client
> environment:java.version=1.7.0_72
>
> ...
>
> 2015-07-14 15:00:03,749 INFO [main] zookeeper.RecoverableZooKeeper: Process
> identifier=clean znode for master connecting to ZooKeeper ensemble=
> 10.240.131.17:2200,10.240.131.16:2200,10.240.131.15:2200,
> 10.240.131.14:2200,
> 10.240.131.18:2200
> 2015-07-14 15:00:03,751 INFO [main-SendThread(10-240-131-18:2200)]
> zookeeper.ClientCnxn:
> Opening socket connection to server 10-240-131-18/10.240.131.18:2200. Will
> not attempt to authenticate using SASL (unknown error)
> 2015-07-14 15:00:03,757 INFO [main-SendThread(10-240-131-18:2200)]
> zookeeper.ClientCnxn:
> Socket connection established to 10-240-131-18/10.240.131.18:2200,
> initiating session
> 2015-07-14 15:00:03,764 INFO [main-SendThread(10-240-131-18:2200)]
> zookeeper.ClientCnxn:
> Session establishment complete on server 10-240-131-18/10.240.131.18:2200,
> sessionid = 0x34e8a64b453024a, negotiated timeout = 40000
> 2015-07-14 15:00:04,835 INFO [main] zookeeper.ZooKeeper: Session:
> 0x34e8a64b453024a closed
> 2015-07-14 15:00:04,835 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down"
>
> After print " Didn't find this log in ZK..." every hour at a time
> The master dead
>
> Zookeeper running log as follows
>
> "2015-07-14 15:00:03,756 [myid:3] - INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2200:NIOServerCnxnFactory@197] - Accepted socket
> connection
> from /10.240.131.18:52733
> 2015-07-14 15:00:03,761 [myid:3] - INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2200:ZooKeeperServer@868] - Client attempting to establish
> new session at /10.240.131.18:52733
> 2015-07-14 15:00:03,762 [myid:3] - INFO
> [CommitProcessor:3:ZooKeeperServer@617] - Established session
> 0x34e8a64b453024a with negotiated timeout 40000 for client /
> 10.240.131.18:52733
> 2015-07-14 15:00:04,836 [myid:3] - INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2200:NIOServerCnxn@1007] - Closed socket connection for
> client /10.240.131.18:52733 which had sessionid 0x34e8a64b453024a"
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message