Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D46D2956F for ; Tue, 3 Apr 2012 22:06:48 +0000 (UTC) Received: (qmail 21988 invoked by uid 500); 3 Apr 2012 22:06:48 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 21948 invoked by uid 500); 3 Apr 2012 22:06:48 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 21939 invoked by uid 99); 3 Apr 2012 22:06:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2012 22:06:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2012 22:06:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 36935356A5A for ; Tue, 3 Apr 2012 22:06:27 +0000 (UTC) Date: Tue, 3 Apr 2012 22:06:27 +0000 (UTC) From: "Zhihong Yu (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <1366777846.9085.1333490787269.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <128360593.40139.1332304140996.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245804#comment-13245804 ] Zhihong Yu commented on HBASE-5606: ----------------------------------- Integrated to 0.92, 0.94 and trunk. Thanks for the patch Prakash. Thanks for the review Stack, Jimmy and Chinna. > SplitLogManger async delete node hangs log splitting when ZK connection is lost > -------------------------------------------------------------------------------- > > Key: HBASE-5606 > URL: https://issues.apache.org/jira/browse/HBASE-5606 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 0.92.0 > Reporter: Gopinathan A > Assignee: Prakash Khemani > Priority: Critical > Fix For: 0.92.2 > > Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch > > > 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; > 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; > 3. Servershutdownhandler retried the log splitting; > 4. The asynchronously deletion in step 2 finally happened for new task > 5. This made the SplitLogManger in hanging state. > This leads to .META. region not assigened for long time > {noformat} > hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 > hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 > {noformat} > {noformat} > hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 > hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira