hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle
Date Mon, 07 Nov 2016 20:13:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645284#comment-15645284
] 

Arun Suresh edited comment on YARN-4597 at 11/7/16 8:13 PM:
------------------------------------------------------------

Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that we the SCHEDULED state is a state that comes before RUNNING. and the *runningContainers*
collection actually holds containers that have been marked to run by the scheduler, am thinking
scheduled containers may not be apt here. How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 'shouldLaunchContainer'
serves the same purpose, can we reuse that ? if so, the container#isMarkedToKill is also not
needed.
Hmm.. my understanding is that it is slightly different. The *shouldLaunchContainer* is IIUC,
used during the recovery process to signal that the container should not be launched. What
*killedBeforeStart* aims to do is to notify the *ContainerLaunch* (which runs in a different
thread) that the Scheduler, which had requested to start the container earlier, in the last
moment changed its mind and decided to kill it. Using shouldLaunchContainer also causes the
CONTAINER_LAUNCH event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue length, we are
killing the container directly instead of queueing the container, in this case, we should
not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once queue limit is
reached, no new opportunistic containers should also be queued. The AM is free to request
it again. The MRAppMaster, for eg. re-requests the same task as a GUARANTEED container.

Hope this made sense ?


was (Author: asuresh):
Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that we the SCHEDULED state is a state that comes before RUNNING. and the *runningContainers*
collection actually holds containers that have been marked to run by the scheduler, am thinking
scheduled containers may not be apt here. How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 'shouldLaunchContainer'
serves the same purpose, can we reuse that ? if so, the container#isMarkedToKill is also not
needed.
Hmm.. my understanding is that it a slightly different. The *shouldLaunchContainer* is IIUC,
used during the recovery process to signal that the container should be launched or not. What
*killedBeforeStart* aims to do is to notify the *ContainerLaunch* (which runs in a different
thread) that the Scheduler might have requested to start the container earlier, but in the
last minute decided to kill it. Using shouldLaunchContainer also causes the CONTAINER_LAUNCH
event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue length, we are
killing the container directly instead of queueing the container, in this case, we should
not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once queue limit is
reached, no new opportunistic containers should also be queued. The AM is free to request
it again. The MRAppMaster, for eg. re-requests the same task as a GUARANTEED container.

Hope this made sense ?

> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, YARN-4597.003.patch, YARN-4597.004.patch,
YARN-4597.005.patch, YARN-4597.006.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several
features could be more cleanly implemented if the NM included a separate stage for reserving
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message