hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
Date Tue, 03 Apr 2012 10:32:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245112#comment-13245112

Flavio Junqueira commented on HDFS-3092:

Hi Suresh, Thanks for sharing a design document. I have a few comments and questions if you
don't mind:

# I find this design to be very close to bookkeeper, with a few important differences. One
noticeable difference that has been mentioned elsewhere is that bookies implement mechanisms
to enable high performance when there are multiple concurrent ledgers being written to. Your
design does not seem to consider the possibility of multiple concurrent logs, which you may
want to have for federation. Federation will be useful for large deployments, but not for
small deployments. It sounds like a good idea to have a solution that covers both cases. 

# There has been comments about comparing the different approaches discussed, and I was wondering
what criteria you have been thinking of using to compare them. I guess it can't be performance
because as the numbers Ivan has generated show, the current bottleneck is the namenode code,
not the logging. Until the existing bottlenecks in the namenode code are removed, having a
fast logging mechanism won't make much difference with respect to throughput. 
# I was wondering about how reads to the log are executed if writes only have to reach a majority
quorum. Once it is time to read, how does the reader gets a consistent view of the log? One
JD alone may not have all entries, so I suppose the reader may need to read from multiple
JDs to get a consistent view? Do the transaction identifiers establish the order of entries
in the log? One quick note is that I don't see why a majority is required; bk does not require
a majority.

Here are some notes I took comparing the bk approach with the one in this jira, in the case
you're interested:

# *Rolling*: The notion of rolling here is equivalent to closing a ledger and creating a new
one. As ledgers are identified with numbers that are monotonically increasing, the ledger
identifiers can be used to order the sequence of logs created over time.
# *Single writer*: Only one client can add new entries to a ledger. We have the notion of
a recovery client, which is essentially a reader that makes sure that there is agreement on
the end of the ledger. Such a recovery client is also able to write entries, but these writes
are simply to make sure that there is enough replication.
# *Fencing*: We fence ledgers individually, so that we guarantee that all operations a ledger
writer returns successfully are persisted on enough bookies. This is different from the approach
proposed here, which essentially fences logging as a whole.   
# *Split brain*: In a split-brain situation, bk can have two writers each writing to a different
ledger. However, my understanding is that a namenode that is failing over cannot make progress
without reading the previous log (ledger), consequently this situation cannot occur with bk
and we don’t require writes to a majority for correctness. 
# *Adding JDs*: The mechanism described here mentions explicitly adding a new JD. My understanding
is that a new JD is brought up and it is told somehow to connect to the namenode and to another
JD in the JournalList to sync up. bk currently only picks bookies from a pool of available
bookies through zookeeper. It shouldn’t be a problem to allow a fixed list of bookies to
be passed upon creating a ledger.
# *Striping*: bk implements striping, although that’s an optional feature. It is possible
to use a configuration like 2-2 or 3-3 (Q-N, Q=quorum size and N=ensemble size).  
# *Failure detection*: bk uses zookeeper ephemeral nodes to track bookies that are available.
A client also changes its ensemble view if it loses a bookie by adding a new bookie. I’m
not exactly sure how you monitor crashes here. Is it the namenode that keeps track of which
JDs in the JournalList are available?

> Enable journal protocol based editlog streaming for standby namenode
> --------------------------------------------------------------------
>                 Key: HDFS-3092
>                 URL: https://issues.apache.org/jira/browse/HDFS-3092
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, name-node
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: MultipleSharedJournals.pdf
> Currently standby namenode relies on reading shared editlogs to stay current with the
active namenode, for namespace changes. BackupNode used streaming edits from active namenode
for doing the same. This jira is to explore using journal protocol based editlog streams for
the standby namenode. A daemon in standby will get the editlogs from the active and write
it to local edits. To begin with, the existing standby mechanism of reading from a file, will
continue to be used, instead of from shared edits, from the local edits.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message