hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7964) Add support for async edit logging
Date Fri, 20 Mar 2015 23:46:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372334#comment-14372334
] 

Daryn Sharp commented on HDFS-7964:
-----------------------------------

As background, the problem was tackled after recurring slow IO issues caused some handlers
to block with a small batch of edits.  Remaining handlers filled the other side of the edit
log double-buffer.  In the worst case scenario, an auto-sync was triggered by logEdit while
the write lock was held.  The call queue overflowed, further exacerbated by the resulting
tcp listen queue overflows, tcp syn cookies, and client timeouts.  When the ipc machinery
recovered, the process would repeat in an oscillating manner until the IO issues dissipated.
 Even w/o an auto-sync, the high rate of read operations caused small batching of writes.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is called within
the namespace write log, while logSync is called outside of the lock to allow greater concurrency.
 The handler thread remains busy until logSync returns to provide the client with a durability
guarantee for the response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  Although the
write lock is not held, readers are limited/starved and the call queue fills.  Combining an
edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability
guarantee but immediately free up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message