hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Mankude (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2874) HA: edit log should log to shared dirs before local dirs
Date Sat, 04 Feb 2012 00:59:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200226#comment-13200226
] 

Hari Mankude commented on HDFS-2874:
------------------------------------

Can you run the test where this happens?

1. NN is writing to shared edits and local edits. (With this fix, shared edits is written
first)
2. Lets say, transaction with txid 100 was written to shared edits and then NN died. txid
100 was not written to local edit dir. So 99 is the last txid recorded in the local edit dir.
3. Same NN is restarted. 
4. Does it restart the next txid from 101 or 100? If it starts from 101, local edits dir will
have a gap. If it starts from 100, then last transaction in shared edits dir will be invalid.

                
> HA: edit log should log to shared dirs before local dirs
> --------------------------------------------------------
>
>                 Key: HDFS-2874
>                 URL: https://issues.apache.org/jira/browse/HDFS-2874
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2874.txt, hdfs-2874.txt
>
>
> Currently, the NN logs its edits to each of its edits directories in sequence. This can
produce the following bad sequence:
> - NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive,
then crashes
> - Failover occurs. SBN takes over at txid=1, since txid 1 never got writen.
> - First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead"
of the active NN with inconsistent state.
> The solution is to write to the shared edits dir, and sync that, before writing to any
local drives.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message