hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7964) Add support for async edit logging
Date Thu, 28 Sep 2017 18:57:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184659#comment-16184659

Daryn Sharp commented on HDFS-7964:

Wow that looks super broken.  I hope you don't get double edits or out of order edits during
a log roll...

bq. I found if editPendingQ in FSEditLogAsync.java blocked, all the IPC handlers will slow
down and performance degraded.

Yes, of course, and edit logging used to frequently block with huge reduction in throughput.
 Did you actually profile a problem or attempt to micro-optimize?

For the 4k queue to block, sync time would have to be on the order of at least hundreds of
ms under very heavy write load or multiple seconds for a moderate write load.  Glanced at
the metrics for the past hour, I see up to 10k write ops/sec + 70k read ops/sec.  We've seen
~285k ops/sec under read load abuse so I think throughput is ok.

> Add support for async edit logging
> ----------------------------------
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>             Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>         Attachments: HDFS-7964-branch-2.7.patch, HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch,
HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch
> Edit logging is a major source of contention within the NN.  LogEdit is called within
the namespace write log, while logSync is called outside of the lock to allow greater concurrency.
 The handler thread remains busy until logSync returns to provide the client with a durability
guarantee for the response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  Although the
write lock is not held, readers are limited/starved and the call queue fills.  Combining an
edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability
guarantee but immediately free up the handlers.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message