hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster
Date Tue, 25 Apr 2017 02:15:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982284#comment-15982284
] 

Naganarasimha G R commented on YARN-6523:
-----------------------------------------

Approach depends on why we are sending credentials for all apps which i am not completely
clear. IMO it should be sufficient to send the tokens for the apps (containers) active on
the node.
Possible solutions :
# Send only app credentials related to the node on each heartbeat
# Send only app credentials related to the node on each heartbeat and also delta modifications
for the node since the last heartbeat.
# Cache SystemCredentialsForAppsProto objects itself and reuse them rather than recreating
for each node's heartbeat.(if require to send all the apps token to the node)

P.S. credit goes to [~gu chi] for analysis of this issue.


> RM requires large memory in sending out security tokens as part of Node Heartbeat in
large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens though all applications
might not be active on the node. On top of it NodeHeartbeatResponsePBImpl converts tokens
for each app into SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 8GB RAM
configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message