hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler
Date Wed, 30 Apr 2014 06:54:14 GMT

     [ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jian He updated YARN-1368:

    Attachment: YARN-1368.preliminary.patch

Preliminary patch to re-populate RMContainer, schedulerNode, schedulerApplicationAttempt,
appSchedulingInfo and Queue states.
- ResourceTrackerService receives the containers info and send them to RMNode, which in turn
sends container statuses to scheduler to do the recovery.
- the majority of the recovery logic is AbstractYarnScheduler#recoverContainersOnNode() which
recovers RMContainer, SchedulerNode,Queue. SchedulerApplicationAttempt, appSchedulingInfo

To do:
- Noticed that FiCaSchedulerNode and  FSSchedulerNode are almost the same. Any reason for
keeping both ? thinking to merge the common methods into SchedulerNode.
- RM_WORK_PRESERVING_RECOVERY_ENABLED will be used to guard against the new changes.
- ContainerStatus sent in NM registration doesn’t capture enough information for re-constructing
the containers. we may replace that with a new object or just adding more fields to encapsulate
all the necessary information for re-constructing the container.
- More changes on recover interfaces, edge cases and the transition logic in RMApp/RMAppAttempt
- more thorough test cases.

RMContainer, SchedulerNode and SchedulerApplicationAttempt, AppSchedulingInfo can be recovered
in a common way. CSQueue and FSQueue may need to implements its own recoverContainer method

> Common work to re-populate containers’ state into scheduler
> -----------------------------------------------------------
>                 Key: YARN-1368
>                 URL: https://issues.apache.org/jira/browse/YARN-1368
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1368.preliminary.patch
> YARN-1367 adds support for the NM to tell the RM about all currently running containers
upon registration. The RM needs to send this information to the schedulers along with the
NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster.

This message was sent by Atlassian JIRA

View raw message