hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3743) QJM: improve formatting behavior for JNs
Date Tue, 05 Apr 2016 18:49:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226909#comment-15226909
] 

Jian Fang commented on HDFS-3743:
---------------------------------

I added a formatNonFileJournalsIfNecessary() method to FSEditLog to check for unformatted
shared edits and format them. It worked sometimes, but failed sometimes because this method
was only called in startActiveServices(). If the new journal node did not come up quickly
enough, the new active name node may fail to format the unformatted journal because it could
not wait forever to get back the RPC responses in QuorumCall in the case of some existing
journal nodes going down.
 
If there is a convenient way to store the previous configured/formatted journal nodes so that
we could derive what are the new journal nodes, then things become much easier and we could
wait for a longer time for the new journal nodes and don't need to do anything once all new
journal nodes are formatted successfully. But unfortunately, this is not easy without bringing
in extra dependencies, which is not good.
 
Another alternative is to modify the behavior of "initializeSharedEdits", for example, add
an option "-newEditsOnly" so that it would only format new journal nodes and leave existing
journal nodes intact. This requires that we stop name node first, call "hdfs namenode -initializeSharedEdits
-newEditsOnly" to format new journal nodes, and then start up name node. The disadvantage
is that we may need to come up with a new way to solve this issue again if we want to have
reconfiguration without restart.


> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
>                 Key: HDFS-3743
>                 URL: https://issues.apache.org/jira/browse/HDFS-3743
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer takes over,
if they don't have any data for that namespace. However, this has a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg corresponding
to another cluster), it will auto-format a directory on those nodes. This doesn't cause any
data loss, but would be better to bail out with an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able to re-join
the cluster and start storing new segments without having to fail over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes undone), and the
user starts the NN, it should fail to start, because it may end up missing edits. If it auto-formats
in this case, the user might have silent "rollback" of the most recent edits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message