hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster
Date Tue, 14 Oct 2014 14:29:35 GMT

     [ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe reassigned YARN-2314:

    Assignee: Jason Lowe

bq. Basically the cache doesn't have more functionalities other than just cache the connection.

It doesn't even do that, because if we cache the connection to the NM then we leak threads.
 When a cache entry is purged the RPC Client thread (tied to the NM socket connection) can
linger because the RPC layer doesn't provide a way to force a connection to be closed due
to protocol refcounting.  We need to set the RPC idle timeout to 0 as a workaround to force
the connections to close so we don't leak threads.  Therefore all the cache is doing is caching
the proxy objects with no connection behind them.  Those objects will reconnect to the NM
each time we make a call.

Not sure saving the proxy objects themselves is worth it -- would be interesting to prove
this cache helps in a meaningful way before we assume we need it.  But I can update the patch
to provide a config property to keep it anyway, hope to have that up later today.

> ContainerManagementProtocolProxy can create thousands of threads for a large cluster
> ------------------------------------------------------------------------------------
>                 Key: YARN-2314
>                 URL: https://issues.apache.org/jira/browse/YARN-2314
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of this cache
is configurable.  However the cache can grow far beyond the configured size when running on
a large cluster and blow AM address/container limits.  More details in the first comment.

This message was sent by Atlassian JIRA

View raw message