hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster
Date Mon, 08 May 2017 16:15:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001009#comment-16001009

Naganarasimha G R commented on YARN-6523:

[~jlowe], In our offline discussion you had mentioned as 
bq.  believe there's still some optimization that can be done given that once a token is retrieved
by the RM on behalf of an application that token is sent for every heartbeat to every node
in the cluster until that application completes.  That's very wasteful.  Doing a sequence
number version thing as I suggested earlier with a precomputed system credentials would drastically
cut down on the traffic and garbage created for every heartbeat.  However I agree in light
of the custom release findings that the priority of fixing this is far lower than before.

Agree for the long running app unnecessary tokens will be exchanged after 7 days, which is
unnecessary traffic and memory reclaiming. {{sequence number version thing}} seems to be a
good fit approach will try work on it further.

> RM requires large memory in sending out security tokens as part of Node Heartbeat in
large cluster
> --------------------------------------------------------------------------------------------------
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
> Currently as part of heartbeat response RM sets all application's tokens though all applications
might not be active on the node. On top of it NodeHeartbeatResponsePBImpl converts tokens
for each app into SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 8GB RAM
configured for RM

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message