hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle
Date Wed, 12 Oct 2016 13:56:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568792#comment-15568792
] 

Arun Suresh edited comment on YARN-4597 at 10/12/16 1:56 PM:
-------------------------------------------------------------

Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since I thought it was harmless...
but, yeah looks like it does over-ride the exitcode. Will remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. We currently
club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the container is actually not running,
and is thus misleading. SCHEDULED implies that some of the containers dependencies (resources
for localization + some internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be visible to
applications, if Queuing was enabled. This patch technically just renames QUEUED to SCHEDULED.
Also, all containers will go thru the SCHEDULED state, not just the opportunistic ones (although,
for guaranteed containers this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the {{ContainerScheduler}} runs
in the same thread as the ContainerManager's AsyncDispatcher started by the ContainerManager.
Also, the Scheduler is triggered only by events. I was wondering if there is any merit pushing
these events into a blocking queue as they arrive and have a separate thread take care of
them. This will preserve the serial nature of operation (and thereby keep the code simple
by not needing synchronized collections) and will not hold up the dispatcher from delivering
other events while the scheduler is scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for the most part
will be blocked on the queue. This thread could be used by one of the containers.


was (Author: asuresh):
Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since it thought its harmless...
but, yeah looks like it does over-ride the exitcode. Will remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. We currently
club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the container is actually not running,
and is thus misleading. SCHEDULED implies that some of the containers dependencies (resources
for localization + some internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be visible to
applications, if Queuing was enabled. This patch technically just renames QUEUED to SCHEDULED.
Also, all containers will go thru the SCHEDULED state, not just the opportunistic ones (although,
for guaranteed containers this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the {{ContainerScheduler}} runs
in the same thread as the ContainerManager's AsyncDispatcher started by the ContainerManager.
Also, the Scheduler is triggered only by events. I was wondering if there is any merit pushing
these events into a blocking queue as they arrive and have a separate thread take care of
them. This will preserve the serial nature of operation (and thereby keep the code simple
by not needing synchronized collections) and will not hold up the dispatcher from delivering
other events while the scheduler is scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for the most part
will be blocked on the queue. This thread could be used by one of the containers.

> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several
features could be more cleanly implemented if the NM included a separate stage for reserving
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message