hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
Date Thu, 30 Aug 2012 14:23:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444975#comment-13444975

Tsz Wo (Nicholas), SZE commented on HDFS-3540:

Recovery mode will always prompt before doing anything which could lead to data loss. So no,
stray OP_INVALID bytes will not lead to silent data loss.

Actually, looking at change 1349086, which was introduced by HDFS-3521, I see that it broke
end-of-file checking by default. Since dfs.namenode.edits.toleration.length is -1 by default,
FSEditLog#checkEndOfLog is never invoked. However, this is not a problem with Recovery Mode;
it's a problem with change 1349086.
Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode.  If a stray OP_INVALID
byte is within the unchecked region, it will cause silent data loss.

Recovery Mode does consider the corruption length. The location at which the problem occurred
is printed out. This is the message "Failed to parse edit log (<file name>) at position
<position>, edit log length is <length>..." This information is provided to allow
the system administrator to make an informed decision.
You still do not know the corruption length since there may be padding at the end.  System
admins won't know the padding length and so they won't be able to know the corruption length.

> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>                 Key: HDFS-3540
>                 URL: https://issues.apache.org/jira/browse/HDFS-3540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.2.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the recovery mode
feature in branch-1 is dramatically different from the recovery mode in trunk since the edit
log implementations in these two branch are different.  For example, there is UNCHECKED_REGION_LENGTH
in branch-1 but not in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH
and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further improvement
in this issue.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message