hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7964) Add support for async edit logging
Date Sat, 17 Oct 2015 00:40:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961593#comment-14961593
] 

Jing Zhao commented on HDFS-7964:
---------------------------------

Thanks for rebasing the patch, Daryn. The patch looks good to me. Some minor comments:
# The following code uses whether the current thread holds the monitor to decide whether the
edit should be async/sync. This way may be not direct to follow, also make it hard to guarantee
the correctness of future code. Can we simply make the decision based on the op itself?
{code}
    // only rpc calls not explicitly sync'ed on the log will be async.
    if (rpcCall != null && !Thread.holdsLock(this)) {
      edit = new AsyncEdit(this, op, rpcCall);
    } else {
      edit = new SyncEdit(this, op);
    }
{code}
# If requests keeps coming but the traffic is slow, the sync will happen only when the buffer
is full, which means the response may be delayed? This may be a rare case in practice but
maybe we should avoid it here. Can we make each iteration of the loop either fill the buffer
or drain the pending queue?
{code}
        if (edit != null) {
          // sync if requested by edit log.
          doSync = edit.logEdit();
          syncWaitQ.add(edit);
        } else {
          // sync when editq runs dry, but have edits pending a sync.
          doSync = !syncWaitQ.isEmpty();
        }
{code}
# The class InvalidOp has not been used. We can either remove it or use it in {{OP_INVALID}}.
# Maybe we can do some further cleanup for {{RollingUpgradeOp}}. E.g., after adding classes
like {{RollingUpgradeStartOp}} and {{RollingUpgradeFinalizeOp}}, we can put {{getInstance}}
methods there and remove {{getStartInstance}} and {{getFinalizeInstance}}.
# Is the main reason of having {{OpInstanceCache#get}} to minimize the code change?
# It will be helpful to add a comment to explain the calculation logic of {{editsBatchedInSync}}.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is called within
the namespace write log, while logSync is called outside of the lock to allow greater concurrency.
 The handler thread remains busy until logSync returns to provide the client with a durability
guarantee for the response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  Although the
write lock is not held, readers are limited/starved and the call queue fills.  Combining an
edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability
guarantee but immediately free up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message