Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E99F7187C4 for ; Thu, 16 Jul 2015 02:19:13 +0000 (UTC) Received: (qmail 14579 invoked by uid 500); 16 Jul 2015 02:19:12 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 14501 invoked by uid 500); 16 Jul 2015 02:19:11 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 11004 invoked by uid 99); 16 Jul 2015 02:14:12 -0000 X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.902 X-Spam-Level: ** X-Spam-Status: No, score=2.902 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, NORMAL_HTTP_TO_IP=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=QTNwbfaVY0QMp5fakASoO5III53X26Ei7t1aTo3yjSc=; b=mpwP7lJyPfbweH0nVCLzbvwqjtAfernjKJ4EQNyHED58JltZNqpjS4wyA9l8lTZrh4 861kxV+++Xn6nZ3oh3kgwUTXqHq+Y7bg/02PHMoirhw++cc8A1xm9/Rn+PbFh2TE0HSh FUt34HN703vsYX0GUTxk/CifHXE6Wh+2hZW47VBUXn1MWcAbCgMwaoaJiwBO4MleXcml Dop7w6zDWzszYylpe5QI1iiBbEeA9IiVgFN5/AhMgN628V8+4R6Vj2b/ptEZxUQC0Raf RbyhhAUKvTAZcf0tjR6o/LE966G9/fQaCE7BWsHfZ90lN51j2e6HUrIpWwxohKoKX+zq OfcQ== MIME-Version: 1.0 X-Received: by 10.52.135.202 with SMTP id pu10mr7720297vdb.40.1437012842303; Wed, 15 Jul 2015 19:14:02 -0700 (PDT) Date: Thu, 16 Jul 2015 10:14:02 +0800 Message-ID: Subject: HBase cluster crashed on-the-hour From: Jo Young Zhang To: user Content-Type: multipart/alternative; boundary=bcaec51a8c6600daae051af4a07a --bcaec51a8c6600daae051af4a07a Content-Type: text/plain; charset=UTF-8 I found hbase clutser crashed on-the-hour HBase master running log as follows "2015-07-14 14:41:49,832 DEBUG [master:10.240.131.18:60000.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-125-46%2C60020%2C1436841063572.1436851865226 2015-07-14 14:45:49,822 DEBUG [master:10.240.131.18:60000.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-85-137%2C60020%2C1436841341086.1436852143141 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/tags/0.96.2RC2 -r 1581096 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Compiled by stack on Mon Mar 24 16:03:18 PDT 2014 2015-07-14 15:00:03,729 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment: host.name=10-240-131-18 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.7.0_72 ... 2015-07-14 15:00:03,749 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble= 10.240.131.17:2200,10.240.131.16:2200,10.240.131.15:2200,10.240.131.14:2200, 10.240.131.18:2200 2015-07-14 15:00:03,751 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Opening socket connection to server 10-240-131-18/10.240.131.18:2200. Will not attempt to authenticate using SASL (unknown error) 2015-07-14 15:00:03,757 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Socket connection established to 10-240-131-18/10.240.131.18:2200, initiating session 2015-07-14 15:00:03,764 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Session establishment complete on server 10-240-131-18/10.240.131.18:2200, sessionid = 0x34e8a64b453024a, negotiated timeout = 40000 2015-07-14 15:00:04,835 INFO [main] zookeeper.ZooKeeper: Session: 0x34e8a64b453024a closed 2015-07-14 15:00:04,835 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down" After print " Didn't find this log in ZK..." every hour at a time The master dead Zookeeper running log as follows "2015-07-14 15:00:03,756 [myid:3] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2200:NIOServerCnxnFactory@197] - Accepted socket connection from /10.240.131.18:52733 2015-07-14 15:00:03,761 [myid:3] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2200:ZooKeeperServer@868] - Client attempting to establish new session at /10.240.131.18:52733 2015-07-14 15:00:03,762 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x34e8a64b453024a with negotiated timeout 40000 for client / 10.240.131.18:52733 2015-07-14 15:00:04,836 [myid:3] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2200:NIOServerCnxn@1007] - Closed socket connection for client /10.240.131.18:52733 which had sessionid 0x34e8a64b453024a" --bcaec51a8c6600daae051af4a07a--