Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 102B9C814 for ; Mon, 30 Apr 2012 19:48:11 +0000 (UTC) Received: (qmail 80829 invoked by uid 500); 30 Apr 2012 19:48:10 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 80789 invoked by uid 500); 30 Apr 2012 19:48:10 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 80781 invoked by uid 99); 30 Apr 2012 19:48:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 19:48:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 19:48:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 77775428643 for ; Mon, 30 Apr 2012 19:47:49 +0000 (UTC) Date: Mon, 30 Apr 2012 19:47:49 +0000 (UTC) From: "Nicolas Spiegelberg (JIRA)" To: issues@hbase.apache.org Message-ID: <2102123233.10520.1335815269490.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <134015701.5404.1335200258694.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265150#comment-13265150 ] Nicolas Spiegelberg commented on HBASE-5860: -------------------------------------------- @Prakash: this code wouldn't pick up that the RESCAN znode was created because that uses createRescanNode() instead of createNode(). Should we not also increment tot_mgr_node_create_queued for createRescanNode() and increment tot_mgr_node_create_result in CreateRescanAsyncCallback.processResult? > splitlogmanager should not unnecessarily resubmit tasks when zk unavailable > --------------------------------------------------------------------------- > > Key: HBASE-5860 > URL: https://issues.apache.org/jira/browse/HBASE-5860 > Project: HBase > Issue Type: Improvement > Reporter: Prakash Khemani > Assignee: Prakash Khemani > Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch > > > (Doesn't really impact the run time or correctness of log splitting) > say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) > splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. > 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 > 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split > 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] > 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 > 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session > 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 > 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout > 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout > 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect > 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect > 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 > 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira