hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2874) HA: edit log should log to shared dirs before local dirs
Date Sat, 04 Feb 2012 01:23:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200246#comment-13200246
] 

Aaron T. Myers commented on HDFS-2874:
--------------------------------------

Patch largely looks good, Todd. A few comments/questions:

# Maybe add a comment mentioning that we use LinkedHashSet since it provides a predictable-order
implementation of the Set interface?
# "we need to make sure all edits are on place" - s/on/in/g
# Why the call to Lists.newArrayList in getNamespaceEditsDirs?
# Looks like you retained the functionality wherein required journals are operated on first,
which should no longer be necessary, right? It should be OK as you have it, though, since
the shared edits dir is automatically marked required, and therefore will necessarily be operated
on before all others (required or non-required.)
# I don't follow why the changes in GenericTestUtils were necessary.
                
> HA: edit log should log to shared dirs before local dirs
> --------------------------------------------------------
>
>                 Key: HDFS-2874
>                 URL: https://issues.apache.org/jira/browse/HDFS-2874
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2874.txt, hdfs-2874.txt
>
>
> Currently, the NN logs its edits to each of its edits directories in sequence. This can
produce the following bad sequence:
> - NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive,
then crashes
> - Failover occurs. SBN takes over at txid=1, since txid 1 never got writen.
> - First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead"
of the active NN with inconsistent state.
> The solution is to write to the shared edits dir, and sync that, before writing to any
local drives.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message