hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6878) DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving
Date Thu, 27 Sep 2012 17:00:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464874#comment-13464874

Prakash Khemani commented on HBASE-6878:

The logic to indefinitely retry a failing log-splitting task is not inside SplitLogManager.
SplitLogManager will retry a task finite number of times. If it fails then it is the outer
Master layers that indefinitely retry. the reason for this behavior is to build tools around
distributed log splitting. If distributed log splitting were being used by a tool then you
wouldn't want it to indefinitely retry.

So the behavior outlined in this bug report is correct. But this behavior shouldn't lead to
any bug.

(There are only a few places in SplitLogManager where it resubmits the task forcefully, disregarding
the retry limit. I think the only two cases are when a region server (splitlogworker) dies
and when a splitlogworker "resigns" from the task (i.e. gives up the task even though there
were no failures))
> DistributerLogSplit can fail to resubmit a task done if there is an exception during
the log archiving
> ------------------------------------------------------------------------------------------------------
>                 Key: HBASE-6878
>                 URL: https://issues.apache.org/jira/browse/HBASE-6878
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: nkeywal
>            Priority: Minor
> The code in SplitLogManager# getDataSetWatchSuccess is:
> {code}
> if (slt.isDone()) {
>       LOG.info("task " + path + " entered state: " + slt.toString());
>       if (taskFinisher != null && !ZKSplitLog.isRescanNode(watcher, path)) {
>         if (taskFinisher.finish(slt.getServerName(), ZKSplitLog.getFileName(path)) ==
Status.DONE) {
>           setDone(path, SUCCESS);
>         } else {
>           resubmitOrFail(path, CHECK);
>         }
>       } else {
>         setDone(path, SUCCESS);
>       }
> {code}
>           resubmitOrFail(path, CHECK);
> should be 
>           resubmitOrFail(path, FORCE);
> Without it, the task won't be resubmitted if the delay is not reached, and the task will
be marked as failed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message