hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps
Date Tue, 02 Sep 2014 21:56:21 GMT

     [ https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jian He updated YARN-2456:
    Attachment: YARN-2456.1.patch

Patch changes RMState#appState to use TreeMap so that applicationState is stored in the order
of ApplicationId. 

Here, the assumption is that applicationId order implicitly indicates application submission
order,  though this is not totally accurate in a rare case that AppId1 is generated before
AppId2, but indeed submitted after App2.

> Possible deadlock in CapacityScheduler when RM is recovering apps
> -----------------------------------------------------------------
>                 Key: YARN-2456
>                 URL: https://issues.apache.org/jira/browse/YARN-2456
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2456.1.patch
> Consider this scenario:
> 1. RM is configured with a single queue and only one application can be active at a time.
> 2. Submit App1 which uses up the queue's whole capacity
> 3. Submit App2 which remains pending.
> 4. Restart RM.
> 5. App2 is recovered before App1, so App2 is added to the activeApplications list. Now
App1 remains pending (because of max-active-app limit)
> 6. All containers of App1 are now recovered when NM registers, and use up the whole queue
capacity again.
> 7. Since the queue is full, App2 cannot proceed to allocate AM container.
> 8. In the meanwhile, App1 cannot proceed to become active because of the max-active-app

This message was sent by Atlassian JIRA

View raw message