hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
Date Fri, 12 Dec 2014 02:52:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243633#comment-14243633

Jesse Yates commented on HDFS-6440:

bq. Does this mean that there might be multiple SNNs marking themselves as 'primary checkpointer'
during the same time period, since it is determined by SNN itself

Yes, that is a possibility, which I was getting at with my comment about the primary checkpointer
"ping-ponging". The images would have small deltas, but the ANN would be kept up to date.
As the updates slow down, one of the checkpointers would eventually win. However, either (a)
we haven't seen this show up on any of our clusters or (b) have never noticed any service
issues because of it.

bq. Would it be reasonable to also let ANN to reject fsimage upload request?

Sure, its possible. My concern was around ensuring that the ANN had to most up to date checkpoint
and let the SNNs sort themselves out. It seems a bit more intrusive in the code since you
also need to differentiate the source - you don't want to reject an update from the primary
checkpointer if it occurs just because of the time elapsed. I'd say worth looking into in
a follow up jira though - this is already a pretty large change.

> Support more than 2 NameNodes
> -----------------------------
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch,
> Most of the work is already done to support more than 2 NameNodes (one active, one standby).
This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys
should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some complexity around
managing the checkpointing, and updating a whole lot of tests.

This message was sent by Atlassian JIRA

View raw message