hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9956) RPC listener inefficiently assigns connections to readers
Date Thu, 12 Sep 2013 18:13:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daryn Sharp updated HADOOP-9956:

    Attachment: HADOOP-9956.patch

I removed the serialized assignment of connections to readers by replacing the back and forth
locking between the listener and readers with a thread-safe queue.  The listener can now rapidly
add new connections to the readers.

Of note: during a stampede of connections the current serialized nature provides an (unintentional?)
throttle that perhaps somewhat mitigates the risk of running out of fds.  _If_ many of the
connections will be short-lived and closed while still accepting the stampede, the risk of
running out of fds is lower than this patch.  I think that's a bad assumption, but I'm investigating
ways to rate limit accepting sockets to reduce the chance of running out of fds.

Please provide early feedback on this approach so I know if I'm proceeding down a valid path.

> RPC listener inefficiently assigns connections to readers
> ---------------------------------------------------------
>                 Key: HADOOP-9956
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9956
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9956.patch
> The socket listener and readers use a complex synchronization to update the reader's
NIO {{Selector}}.  Updating active selectors is not thread-safe so precautions are required.
> However, the current locking choreography results in a serialized distribution of new
connections to the parallel socket readers.  A slower/busier reader can stall the listener
and throttle performance.
> The problem manifests as unexpectedly low cpu utilization by the listener and readers
(~20-30%) under heavy load.  The call queue is shallow when it should be overflowing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message