hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1975) HA: Support for sharing the namenode state from active to standby.
Date Sun, 27 Nov 2011 22:58:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158069#comment-13158069

Todd Lipcon commented on HDFS-1975:

I got this patch in conjunction with HDFS-1108 and HDFS-1971 to properly replicate the creation
of a new file, but then moved on to working on {{setReplication}} and ran into issues there.
The issue I'm seeing is this:

1) Active NN receives setReplication to drop some file's replication from 3 to 1
2) It writes OP_SET_REPLICATION to its log, invalidates two replicas, and returns
3) The DNs report BLOCK_INVALIDATED back to both the ActiveNN and SBNN.
4) The SBNN hasn't received the OP_SET_REPLICATION yet, so it marks the block as under-replicated.

In the case of raising replication (eg from 1 to 3) we get the opposite problem: the SBNN
marks the block as over-replicated and adds two of the replicas to its invalidation list.

Generation stamps don't help here, because changing replication level of a block doesn't change
its gen-stamp (and it shouldn't).

The solution I'm thinking of is that we have to track the transaction ID when we send comments
to DNs. So, if a setReplication command at txid=123 causes invalidation of two blocks, we'd
send the INVALIDATE command with "txid=123". Then, when the DN does delete these blocks, it
would ack back with that txid to both NNs. The SBNN wouldn't process this message until it
had loaded that txid.

A bit of a simplification from this would be that any command being processed from an NN will
include the NN's txid, which the DN records in BPOfferService as "latestCommandTxId". Then,
any calls to the NN would include this txid. This is a bit more conservative than tracking
it with each block command, but probably less prone to bugs.

I plan to take a pass at implementing this latter approach.
> HA: Support for sharing the namenode state from active to standby.
> ------------------------------------------------------------------
>                 Key: HDFS-1975
>                 URL: https://issues.apache.org/jira/browse/HDFS-1975
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: Suresh Srinivas
>            Assignee: Jitendra Nath Pandey
>         Attachments: HDFS-1975-HA.2.patch, HDFS-1975-HA.patch, HDFS-1975-HDFS-1623.patch,
HDFS-1975-HDFS-1623.patch, hdfs-1975.txt, hdfs-1975.txt
> To enable hot standby namenode, the standby node must have current information for -
namenode state (image + edits) and block location information. This jira addresses keeping
the namenode state current in the standby node. To do this, the proposed solution in this
jira is to use a shared storage to store the namenode state. 
> Note one could also build an alternative solution by augmenting the backup node. A seperate
jira could explore this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message