hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
Date Mon, 09 Jan 2017 22:46:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813092#comment-15813092

Jason Lowe commented on YARN-4148:

The unit test failures appear to be unrelated.  They pass for me locally with the patch applied,
and there are JIRAs that are tracking those failures.  The TestDelegationTokenRenewer failure
is being tracked by YARN-5816 and the TestRMRestart failure is tracked by YARN-5548.

Thanks for the review, [~djp]!  If you agree the failures are unrelated then feel free to
commit, or I'll do so in a few days unless I hear otherwise.

> When killing app, RM releases app's resource before they are released by NM
> ---------------------------------------------------------------------------
>                 Key: YARN-4148
>                 URL: https://issues.apache.org/jira/browse/YARN-4148
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jun Gong
>            Assignee: Jason Lowe
>         Attachments: YARN-4148.001.patch, YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch,
> When killing a app, RM scheduler releases app's resource as soon as possible, then it
might allocate these resource for new requests. But NM have not released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test environment:
a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 GPUs. Killed app A, then RM
released A's 6 GPUs, and allocated 3 GPUs to B. But when B tried to start container on NM,
NM found it didn't have 3 GPUs to allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when memory is overused.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message