hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered
Date Thu, 07 Aug 2014 21:53:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089909#comment-14089909

Jian He commented on YARN-2249:

Thanks Wangda for the review.
bq. we can cache outstanding container release request until x secs after restart reached.
And could you elaborate why you use NM liveness expire time? 
I chose NM expire time for cache timeout because containers are forcibly killed after nm expired
and we don't need to cache the release requests after that any more
bq. we only need cache release request for a period of time after AM reconnected to RM.
Right, changed to cache the release request only within the timeout.
bq. We should notify AM about container completed message when we decide to not recover a
good point, added.
bq. Can we wait for some state instead of Thread.sleep(3000);?
since the container 's gone, there's no state to wait. I think this is fine.

> RM may receive container release request on AM resync before container is actually recovered
> --------------------------------------------------------------------------------------------
>                 Key: YARN-2249
>                 URL: https://issues.apache.org/jira/browse/YARN-2249
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch
> AM resync on RM restart will send outstanding container release requests back to the
new RM. In the meantime, NMs report the container statuses back to RM to recover the containers.
If RM receives the container release request  before the container is actually recovered in
scheduler, the container won't be released and the release request will be lost.

This message was sent by Atlassian JIRA

View raw message