hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3743) QJM: improve formatting behavior for JNs
Date Wed, 29 Jul 2015 17:06:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646422#comment-14646422

Jian Fang commented on HDFS-3743:

I meant we cannot run "initializeSharedEdits" command to format a new replacement JN (or any
JNs at all) when the name node was running because the directory was locked and we saw the
following exception:

ERROR namenode.NameNode: Could not initialize shared edits dir
java.io.IOException: Cannot lock storage /var/lib/hadoop/dfs-name. The directory is already

As a result, it should be the QJM's responsibility to detect the changes from configuration
by using HADOOP-7001 at run time and format the new JNs properly. If this really works, perhaps
you don't need rolling restart of JNs any more if they don't need to communicate with each
other to make decisions like zookeeper instances. If I understand correctly, the Quorum Journal
protocol only implemented the log replication part of Paxos, right?

> QJM: improve formatting behavior for JNs
> ----------------------------------------
>                 Key: HDFS-3743
>                 URL: https://issues.apache.org/jira/browse/HDFS-3743
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
> Currently, the JournalNodes automatically format themselves when a new writer takes over,
if they don't have any data for that namespace. However, this has a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg corresponding
to another cluster), it will auto-format a directory on those nodes. This doesn't cause any
data loss, but would be better to bail out with an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able to re-join
the cluster and start storing new segments without having to fail over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes undone), and the
user starts the NN, it should fail to start, because it may end up missing edits. If it auto-formats
in this case, the user might have silent "rollback" of the most recent edits.

This message was sent by Atlassian JIRA

View raw message