hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13602) Add an option to fail wal recovery when lease recovery fails
Date Thu, 30 Apr 2015 21:05:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522270#comment-14522270

Sean Busbey commented on HBASE-13602:

no timeout also has the same problem where folks who had slow-to-recover problems suddenly
have hanging-forever problems.

for example, the cluster I saw this on definitely wouldn't have data loss because I manually
ssh'd to each node and verified there were no old RS processes. my FileSystem instance was
failing all lease recovery, so without the timeout it would never have recovered.

> Add an option to fail wal recovery when lease recovery fails
> ------------------------------------------------------------
>                 Key: HBASE-13602
>                 URL: https://issues.apache.org/jira/browse/HBASE-13602
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Sean Busbey
>              Labels: operability
>             Fix For: 2.0.0, 1.2.0
> Currently, if lease recovery doesn't succeed over an extended timeout (default 15 minutes),
then we issue a log message about possible data loss and continue with recovering the edits
in that file.
> In some deployments this potential for dataloss might be unacceptable. In those situations
it would be good to have a configurable setting that marks the recovery failed instead. Should
default to off (at least in branch-1)

This message was sent by Atlassian JIRA

View raw message