hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash
Date Wed, 22 Jun 2011 00:26:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052970#comment-13052970

Todd Lipcon commented on HDFS-2093:

bq. doTestCrashRecoveryEmptyLog assumes the cluster should not start, even if just one of
the dirs has a corrupted log, shouldn't the cluster start as long as only one of the in progress
logs was truncated?

The two different variants of this test are:
a) inBothDirs=false:
- one dir has edits_1-2 and edits_inprogress_3 truncated
- the other dir just has edits_1-2

b) inBothDirs=true:
- both dirs have edits_1-2 and edits_inprogress_3 truncated

In the first case, it should fail because it can tell that it was an unclean shutdown, since
there is a log starting at txid 3 (even though it's corrupt).
In the second case, it fails because it has two logs, both truncated.

I guess the comments on the test cases aren't clear. I'll improve those, and also address
the nit, and upload a new patch.

> 1073: Handle case where an entirely empty log is left during NN crash
> ---------------------------------------------------------------------
>                 Key: HDFS-2093
>                 URL: https://issues.apache.org/jira/browse/HDFS-2093
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: hdfs-2093.txt, hdfs-2093.txt, hdfs-2093.txt
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it finalizes it
to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there are two
logs starting with the same txid

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message