hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9956) RPC listener inefficiently assigns connections to readers
Date Thu, 12 Sep 2013 18:13:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HADOOP-9956:
--------------------------------

    Attachment: HADOOP-9956.patch

I removed the serialized assignment of connections to readers by replacing the back and forth
locking between the listener and readers with a thread-safe queue.  The listener can now rapidly
add new connections to the readers.

Of note: during a stampede of connections the current serialized nature provides an (unintentional?)
throttle that perhaps somewhat mitigates the risk of running out of fds.  _If_ many of the
connections will be short-lived and closed while still accepting the stampede, the risk of
running out of fds is lower than this patch.  I think that's a bad assumption, but I'm investigating
ways to rate limit accepting sockets to reduce the chance of running out of fds.

Please provide early feedback on this approach so I know if I'm proceeding down a valid path.

                
> RPC listener inefficiently assigns connections to readers
> ---------------------------------------------------------
>
>                 Key: HADOOP-9956
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9956
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9956.patch
>
>
> The socket listener and readers use a complex synchronization to update the reader's
NIO {{Selector}}.  Updating active selectors is not thread-safe so precautions are required.
> However, the current locking choreography results in a serialized distribution of new
connections to the parallel socket readers.  A slower/busier reader can stall the listener
and throttle performance.
> The problem manifests as unexpectedly low cpu utilization by the listener and readers
(~20-30%) under heavy load.  The call queue is shallow when it should be overflowing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message