Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7F17DF81 for ; Thu, 27 Sep 2012 17:00:07 +0000 (UTC) Received: (qmail 27905 invoked by uid 500); 27 Sep 2012 17:00:07 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 27850 invoked by uid 500); 27 Sep 2012 17:00:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 27839 invoked by uid 99); 27 Sep 2012 17:00:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 17:00:07 +0000 Date: Fri, 28 Sep 2012 04:00:07 +1100 (NCT) From: "Prakash Khemani (JIRA)" To: issues@hbase.apache.org Message-ID: <1250480683.134463.1348765207625.JavaMail.jiratomcat@arcas> In-Reply-To: <1499547991.121998.1348581616273.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HBASE-6878) DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464874#comment-13464874 ] Prakash Khemani commented on HBASE-6878: ---------------------------------------- The logic to indefinitely retry a failing log-splitting task is not inside SplitLogManager. SplitLogManager will retry a task finite number of times. If it fails then it is the outer Master layers that indefinitely retry. the reason for this behavior is to build tools around distributed log splitting. If distributed log splitting were being used by a tool then you wouldn't want it to indefinitely retry. So the behavior outlined in this bug report is correct. But this behavior shouldn't lead to any bug. (There are only a few places in SplitLogManager where it resubmits the task forcefully, disregarding the retry limit. I think the only two cases are when a region server (splitlogworker) dies and when a splitlogworker "resigns" from the task (i.e. gives up the task even though there were no failures)) > DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving > ------------------------------------------------------------------------------------------------------ > > Key: HBASE-6878 > URL: https://issues.apache.org/jira/browse/HBASE-6878 > Project: HBase > Issue Type: Bug > Components: master > Reporter: nkeywal > Priority: Minor > > The code in SplitLogManager# getDataSetWatchSuccess is: > {code} > if (slt.isDone()) { > LOG.info("task " + path + " entered state: " + slt.toString()); > if (taskFinisher != null && !ZKSplitLog.isRescanNode(watcher, path)) { > if (taskFinisher.finish(slt.getServerName(), ZKSplitLog.getFileName(path)) == Status.DONE) { > setDone(path, SUCCESS); > } else { > resubmitOrFail(path, CHECK); > } > } else { > setDone(path, SUCCESS); > } > {code} > resubmitOrFail(path, CHECK); > should be > resubmitOrFail(path, FORCE); > Without it, the task won't be resubmitted if the delay is not reached, and the task will be marked as failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira