hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart
Date Mon, 06 Apr 2015 14:24:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481249#comment-14481249
] 

Jason Lowe commented on YARN-3449:
----------------------------------

While the NM is aggregating logs the application is still present in the state store, and
the application should be recovered as still active after an NM restart.  The NM will then
register with those applications listed as still active.  When the RM later tells the NM that
those applications should be cleaned up, the applications should be added to the keep alive
list as normal.  Thus I think the appTokenKeepAliveMap state should already be recovered properly
without explicitly persisting it -- or am I missing something?

> Recover appTokenKeepAliveMap upon nodemanager restart
> -----------------------------------------------------
>
>                 Key: YARN-3449
>                 URL: https://issues.apache.org/jira/browse/YARN-3449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application alive after
application is finished but NM still need app token to do log aggregation (when enable security
and log aggregation). 
> The applications are only inserted into this map when receiving getApplicationsToCleanup()
from RM heartbeat response. And RM only send this info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup().
NM restart work preserving should put appTokenKeepAliveMap into NMStateStore and get recovered
after restart. Without doing this, RM could terminate application earlier, so log aggregation
could be failed if security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message