hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandflee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
Date Wed, 29 Jul 2015 01:16:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645326#comment-14645326

sandflee commented on YARN-3987:

Yes the old AM container in NM aren't cleaned up. in our case, AM crashed after it starts,
 RM will create a new appAttempt and launch a new AM and will not expire,  it leaves the complete
container in NM memory and NM stateStore. we set max-am-attempt to a very large num so the
completed am container in NM bombs.  
For AM completed container, RM could send ack msg to NM, seems no need to wait for new AM
to pull complete msg. and your idea? [~jianhe]

> am container complete msg ack to NM once RM receive it
> ------------------------------------------------------
>                 Key: YARN-3987
>                 URL: https://issues.apache.org/jira/browse/YARN-3987
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: sandflee
>            Assignee: sandflee
>         Attachments: YARN-3987.001.patch, YARN-3987.002.patch
> In our cluster we set max-am-attempts to a very very large num, and unfortunately our
am crash after launched, leaving too many completed container(AM container) in NM.  completed
container is removed from NM and NMStateStore only if container complete is passed to AM,
but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat
up  NM heap memory.

This message was sent by Atlassian JIRA

View raw message