Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C56447F61 for ; Tue, 29 Nov 2011 07:28:06 +0000 (UTC) Received: (qmail 92720 invoked by uid 500); 29 Nov 2011 07:28:06 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 92649 invoked by uid 500); 29 Nov 2011 07:28:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 92640 invoked by uid 99); 29 Nov 2011 07:28:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 07:28:04 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 07:28:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0A243A526C for ; Tue, 29 Nov 2011 07:27:40 +0000 (UTC) Date: Tue, 29 Nov 2011 07:27:40 +0000 (UTC) From: "stack (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <1690606472.21272.1322551660042.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <759594170.51259.1320255452261.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-4729) Race between online altering and splitting kills the master MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4729: ------------------------- Attachment: 4729-v4.txt Address Ted comments (I like the one where we need to verify NoNodeException). > Race between online altering and splitting kills the master > ----------------------------------------------------------- > > Key: HBASE-4729 > URL: https://issues.apache.org/jira/browse/HBASE-4729 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: Jean-Daniel Cryans > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.92.0, 0.94.0 > > Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729.txt > > > I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). > What killed the master: > {quote} > 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) > at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) > at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) > at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) > at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {quote} > A znode was created because the region server was splitting the region 4 seconds before: > {quote} > 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. > 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state > 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING > ... > 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT > 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 > {quote} > Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira