hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
Date Wed, 10 Dec 2014 01:10:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240452#comment-14240452

Jesse Yates commented on HDFS-6440:

bq. What is the procedure for adding or replacing NNs?
Not explicitly more easily than currently supported. The problem is that all the nodes currently
have the NNs hard-coded in config. What you could do is roll the NNs with the new NN config.
Then roll the rest of the clients with the new config as well, once the new NN is to date.
I don't know if you would even do anything different than currently configured.

bq. Could it support dynamically adding NNs without downtime?
Not really. You would have to push the downtime question up a level, and rely on something
like ZK to maintain the list of NNs (on the simple approach). It reduces down to a group membership

bq. Would it be possible to avoid multiple SNNs to upload fsimages with trivial deltas in
a short time
Sure. This was the idea behind adding the 'primary checkpointer' logic - if you are not the
primary, then you backoff for 2x the usual wait period, because you assume the primary is
up and doing edits, but check again every so often to make sure it hasn't gotten too far behind.
Obviously there is a possibility for who is the 'primary checkpointer' to ping-pong back and
forth between SNNs, but generally it would be one that gets the lead and keeps it.

bq. Would it be possible that this behavior makes other SNNs miss the edit logs?
Its possible, but that's a somewhat rare occurrence as you can generally bring the NN back
up fairly quickly. If its really far behind, you can then bootstrap up to the current NNs
state and run it from there. In practice, we haven't seen any problems with this.

bq. Does this work support rolling upgrade?
I'm not aware that it would change it.

bq. Would it makes client failover more complicated?
Now instead of two servers, it can fail over between N. I believe the client code currently
supports this as-is.

bq. What would be the impact on the DN side?
Basically, just in block reports to more than 2 NNs. This can start to cause some bandwidth
congestion at some point, but I don't think it would be a problem with up to at least 5 or
7 nodes.

bq. What are the changes on the test resources files (hadoop-*-reserved.tgz) ?
The mini-cluster is designed for supporting only two NNs, down to the files it writes to maintain
the directly layout. Unfortunately, it doesn't manage the directories in any easily updated
way, so I had to rip the existing directory structure it uses and replace it with something
a little more flexible. The changes to the zip files is just to support this updated structure
for the mini-cluster.

> Support more than 2 NameNodes
> -----------------------------
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch,
> Most of the work is already done to support more than 2 NameNodes (one active, one standby).
This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys
should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some complexity around
managing the checkpointing, and updating a whole lot of tests.

This message was sent by Atlassian JIRA

View raw message