hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2782) HA: Support multiple shared edits dirs
Date Mon, 06 Feb 2012 21:06:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201583#comment-13201583

Todd Lipcon commented on HDFS-2782:

Been thinking about this a bit... I don't think it's so trivial as to just use our existing
JournalSet implementation across multiple remote directories. The issue is the following case:
- HA cluster configured with two NNs: NN1 and NN2, and two shared storage directories (SD1
and SD2)
- NN1 is active and writing to SD1 and SD2 when a network issue occurs which partitions NN1
from SD2 and NN2 from SD1 (this isn't that unlikely, for example if NN1 and SD1 share a rack
while NN2 and SD2 share a rack).
- If a failover occurs at this point, NN2 could take over without reading the most recent
edits from SD1, resulting in a divergent namespace.

I have two possible solutions in mind:

*Solution 1: use a traditional quorum system*

When a NN writes, require that all writes must be synced to _W_ shared storage directories.
When the SBN reads, require that it read edits from _R_ shared storage directories. The quorum
requirement is that _R + W > N_ where _N_ is the total number of shared dirs.

In the error case described above, we could set W = 1 and R = 2. Thus the active NN could
continue to operate even if one of the SDs is down -- but no failovers could occur during
that window of time. Once the directory is restored, the system would be fully operational
again and ready for failover.

The usual quorum requirement that W > N/2 would not be necessary here, since we already
ensure a designed single writer by fencing operations.

*Solution 2: use an external source of record to agree on storage directory state.*

In this solution, an external system (likely zookeeper) is used to agree upon the state of
the storage directories. ZK would contain a znode which lists the active shared directories.
When the active NN writes, it writes to all of these active directories. If any of the writes
fail, it must update the znode to mark the failed directory as out-of-date before acking the

When a directory is restored, it will be re-added to the znode listing active directories.

When the SBN processes a failover, it considers only those directories listed in the znode
as active.

Any other solutions I'm not thinking of here?
> HA: Support multiple shared edits dirs
> --------------------------------------
>                 Key: HDFS-2782
>                 URL: https://issues.apache.org/jira/browse/HDFS-2782
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Eli Collins
>            Assignee: Eli Collins
> Supporting multiple shared dirs will improve availability (eg see HDFS-2769). You may
want to use multiple shared dirs on a single filer (eg for better fault isolation) or because
you want to use multiple filers/mounts. Per HDFS-2752 (and HDFS-2735) we need to do things
like use the JournalSet in EditLogTailer and add tests.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message