hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7964) Add support for async edit logging
Date Fri, 20 Mar 2015 22:12:40 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daryn Sharp updated HDFS-7964:
    Attachment: HDFS-7964.patch

The change is simpler than it appears.  We've been running with this patch on 2.6 at production
load since early this year.

{{FSEditLog}} required minor changes to split a few methods to allow overrides in subclasses.
 No functional changes.  There is zero-risk.

{{FSEditLogAsync}} manages a queue and thread for syncing.  For RPC requests, logEdit adds
to the queue, and logSync is a no-op.  The thread may immediately service another call.  However,
the prior calls response is postponed so the IPC machinery will not send the response when
the handler thread completes.  The sync thread will trigger the response after sync'ing.

The thread-local edit log op cache must be disabled for async behavior.  The cache has been
altered such that disabling it returns new instances every time.  This is done by adding the
edit op's class to {{FSEditLogOpCodes}} so the class can be instantiated.  The enabled cache's
enum map is now trivial to build.

The sync thread is designed to maximize the transactions per sync.  It will consume queued
edits and call logEdit, but not logSync, until the queue runs dry or the edit log stream requires
a sync (the rate of edits is so high, or IO is so slow that maximizing the batches is desirable).

Many tests involving edits logs have been parameterized to run with async edit logging off
& on.  I've run all tests with async on.

> Add support for async edit logging
> ----------------------------------
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch
> Edit logging is a major source of contention within the NN.  LogEdit is called within
the namespace write log, while logSync is called outside of the lock to allow greater concurrency.
 The handler thread remains busy until logSync returns to provide the client with a durability
guarantee for the response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  Although the
write lock is not held, readers are limited/starved and the call queue fills.  Combining an
edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability
guarantee but immediately free up the handlers.

This message was sent by Atlassian JIRA

View raw message