hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5241) Provide alternate queuing audit logger to reduce logging contention
Date Tue, 24 Sep 2013 22:52:10 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daryn Sharp updated HDFS-5241:

    Attachment: HDFS-5241.patch

No tests, requesting feedback before investing the time.

Provides an option to enable async logging via a single background thread.  The performance
gains are impressive under an ideal read heavy load:
* fair lock = 26k op/s
* unfair lock = 58k op/s
* unfair lock + unbuffered appender = 120k ops/sec

A single thread consuming log messages from a queue populated by the 100 rpc handlers is sufficient
to improve performance.  Additional threads showed no significant improvement.

The problem is 100 threads colliding on log4j's synch'ed method.  The contention is so high
and the logging call takes enough time, that the thread's futex has to call into the kernel.
 The context switch and rescheduling wait ruins performance.  By comparison, the time spent
waiting to add a log message to the queue is negligible.  The futexes stay in userland.

The performance sweet spot is a queue sized to the number of handlers.  As long as the background
thread can log messages faster than a handler can process the next call, the handler is guaranteed
a spot in the queue w/o a context switch.

It's a configurable undocumented option for now since the audit log becomes prone to data
loss and slight offset of timestamps.

The call queue tends to run relatively dry so I expect my other connection handling patches
like HADOOP-9956 will have a larger impact.

> Provide alternate queuing audit logger to reduce logging contention
> -------------------------------------------------------------------
>                 Key: HDFS-5241
>                 URL: https://issues.apache.org/jira/browse/HDFS-5241
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-5241.patch
> The default audit logger has extremely poor performance.  The internal synchronization
of log4j causes massive contention between the call handlers (100 by default) which drastically
limits the throughput of the NN.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message