hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
Date Wed, 30 Apr 2014 23:24:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986213#comment-13986213

Wangda Tan commented on YARN-1368:

Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat is a reasonable
way, +1 for general ideas,
Some minor comments,
bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any reason for
keeping both ? thinking to merge the common methods into SchedulerNode.
Currently IMO, we'd better keep both. To avoid involving too much parts in this JIRA, we can
separate the merge common logic of them to a new task.
bq. ContainerStatus sent in NM registration doesn’t capture enough information for re-constructing
the containers. we may replace that with a new object or just adding more fields to encapsulate
all the necessary information for re-constructing the container.
Personally I think create a new type specialized for container recovering is better, ContainerStatus
is also used in node heartbeat. Including too much fields in each heartbeat isn't safe or

> Common work to re-populate containers’ state into scheduler
> -----------------------------------------------------------
>                 Key: YARN-1368
>                 URL: https://issues.apache.org/jira/browse/YARN-1368
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1368.preliminary.patch
> YARN-1367 adds support for the NM to tell the RM about all currently running containers
upon registration. The RM needs to send this information to the schedulers along with the
NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster.

This message was sent by Atlassian JIRA

View raw message