hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9956) RPC listener inefficiently assigns connections to readers
Date Fri, 13 Sep 2013 17:37:53 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766692#comment-13766692

Daryn Sharp commented on HADOOP-9956:

New connections already have intrinsically higher priority.  All connection channels are in
the reader's selector which only returns channels ready for reading - ie. not idle.

Prematurely closing sockets is a very bad idea.  I've thought through various approaches,
and closing is the worst.  Closing a socket in between calls is ok because the client will
reconnect on the next call.  The problem is if a client has a request in flight on the network,
or the server received it but the reader just hasn't serviced it yet.  The client has no option
but to throw an exception because it doesn't know if the call is idempotent.

A lot of effort has been spent to address idempotent issues for HA NNs, but other rpc clients
won't gracefully handle the case.  Imagine if sockets kept getting closed on job submissions
to a heavily loaded RM that is aggressively closing connections.  A workflow manager like
oozie will resubmit duplicate jobs.
> RPC listener inefficiently assigns connections to readers
> ---------------------------------------------------------
>                 Key: HADOOP-9956
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9956
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9956.patch
> The socket listener and readers use a complex synchronization to update the reader's
NIO {{Selector}}.  Updating active selectors is not thread-safe so precautions are required.
> However, the current locking choreography results in a serialized distribution of new
connections to the parallel socket readers.  A slower/busier reader can stall the listener
and throttle performance.
> The problem manifests as unexpectedly low cpu utilization by the listener and readers
(~20-30%) under heavy load.  The call queue is shallow when it should be overflowing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message