hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9955) RPC idle connection closing is extremely inefficient
Date Thu, 14 Nov 2013 18:31:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822726#comment-13822726
] 

Kihwal Lee commented on HADOOP-9955:
------------------------------------

When the concurrent hash map is created, the initial size is set to the max call queue size.
 This may not be always ideal. In the production name nodes I've seen, 4X of that will make
more sense. Both the number of concurrent connections and the max call queue length (determined
by number of handlers) are influenced by the size of cluster (containers, jobs, etc.) and
the load, but the two seem only loosely coupled. E.g. a small number of clients can generate
a load that fills up the call queue. There may be a better parameter we can use to determine
the reasonable initial size of {{connections}}. 

It could be a function of {{idleScanThreshold}}. This threshold would normally be set to #
of persistent connections + # connections from steady state average load + slack, so the initial
size for {{connections}} could be set to max call queue size or {{some_factor * idleScanThreshold}}.
Or the max of the two. 

{code}
      this.connections = Collections.newSetFromMap(
          new ConcurrentHashMap<Connection,Boolean>(
              maxQueueSize, 0.75f, readThreads+2));
{code}

> RPC idle connection closing is extremely inefficient
> ----------------------------------------------------
>
>                 Key: HADOOP-9955
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9955
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9955.patch, HADOOP-9955.patch
>
>
> The RPC server listener loops accepting connections, distributing the new connections
to socket readers, and then conditionally & periodically performs a scan for idle connections.
 The idle scan choses a _random index range_ to scan in a _synchronized linked list_.
> With 20k+ connections, walking the range of indices in the linked list is extremely expensive.
 During the sweep, other threads (socket responder and readers) that want to close connections
are blocked, and no new connections are being accepted.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message