hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7964) Add support for async edit logging
Date Tue, 20 Oct 2015 13:39:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965106#comment-14965106

Daryn Sharp commented on HDFS-7964:

# It's ensuring correctness by preventing a deadlock with the background thread.  IIRC, there
is also a call synchronized on the edit log call that must know the current txid (rolling?)
which isn't possible when async.
# Do you mean drain only as many edits from the pending queue as were present at the beginning
of the cycle?  I considered that but if the NN is falling that far behind on edits, bigger
batches and fewer fsyncs are the only way to catch up.  Unless there is a disk issue, the
thread has never fallen behind for us.  It actually batches less than we'd like because ops
can't log fast enough.
# I'd rather not remove code unrelated to the change, can it be a separate jira?
# Was minimizing change but ok.
# Mostly yes, because if I had to touch all the code calling it I would have been inclined
to remove all the static methods which turns into a much larger patch.
# Ok.  In short, it does what it did before which is the number batched is the calls that
piggybacked on another sync (hence the -1).

> Add support for async edit logging
> ----------------------------------
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch, HDFS-7964.patch
> Edit logging is a major source of contention within the NN.  LogEdit is called within
the namespace write log, while logSync is called outside of the lock to allow greater concurrency.
 The handler thread remains busy until logSync returns to provide the client with a durability
guarantee for the response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  Although the
write lock is not held, readers are limited/starved and the call queue fills.  Combining an
edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability
guarantee but immediately free up the handlers.

This message was sent by Atlassian JIRA

View raw message