hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengbing Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
Date Wed, 31 Dec 2014 11:40:13 GMT

     [ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chengbing Liu updated YARN-2997:
--------------------------------
    Attachment: YARN-2997.2.patch

Updated patch.

It handles the following issues:
* If a container is completed, and the corresponding application is still running, the NM
will send duplicated reports about the container, which is unnecesary.
* Currently, if a heartbeat with RM and NM is lost, while the NM is sending a completed container
status whose application is in finished state, it will not send again. In the updated patch,
the NM will store all the completed container statuses and resend them after a lost heartbeat.
* Some test cases are is fixed based on the above considerations.

Please help review the patch, thanks!

> NM keeps sending finished containers to RM until app is finished
> ----------------------------------------------------------------
>
>                 Key: YARN-2997
>                 URL: https://issues.apache.org/jira/browse/YARN-2997
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Chengbing Liu
>         Attachments: YARN-2997.2.patch, YARN-2997.patch
>
>
> We have seen in RM log a lot of
> {quote}
> INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null
container completed...
> {quote}
> It is caused by NM sending completed containers repeatedly until the app is finished.
On the RM side, the container is already released, hence {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message