hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle
Date Fri, 11 Nov 2016 15:11:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656417#comment-15656417
] 

Arun Suresh edited comment on YARN-4597 at 11/11/16 3:11 PM:
-------------------------------------------------------------

Appreciate the review [~kkaranasos],

1.
bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), which are public
and are not following..
sendKillEvent is used by the Scheduler (which is in another package) to kill a container.
Since this patch introduces an external entity that launches and kills a container, viz. the
Scheduler, I feel it is apt to keep both as public methods. I prefer it to 'dispatcher.getEventHandler().handle..'.


2.
The Container needs to be added to the {{nodeUpdateQueue}} if the container is to be move
from ACQUIRED to RUNNING state (this is a state transition all containers should go thru).
Regarding the {{launchedContainers}}, Lets have both Opportunistic and Guaranteed containers
flow through a common code-path... and introduce specific behaviors if required in subsequent
patches as and when required.

3.
bq. In the OpportunisticContainerAllocatorAMService we are now calling the SchedulerNode::allocate,
and then we do not update the used resources but we do update some other counters, which leads
to inconsistencies.
Hmmm... I do see that the numContainers are not decremented correctly when release. Thanks...
but it looks like it would more likely just impact reporting  / UI, nothing functional (Will
update the patch). Can you specify which other counters specifically ? Like I mentioned above..
lets run all containers thru as much of the common code path before we add new counters etc.

4.
bq. Maybe as part of a different JIRA, we should at some point extend the container.metrics
in the ContainerImpl to keep track of the scheduled/queued containers.
Yup.. +1 to that.

The rest of your comments make sense... will update patch.

bq. let's stress-test the code in a cluster before committing to make sure everything is good
It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic containers) and
I didn't hit any major issues. We can always open follow-up JIRAs for specific performance
related issues as and when we find it. Besides, stess-testing is not really a precondition
to committing a patch.



was (Author: asuresh):
Appreciate the review [~kkaranasos],

1.
bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), which are public
and are not following..
sendKillEvent is used by the Scheduler (which is in another package) to kill a container.
Since this patch introduces an external entity that launches and kills a container, viz. the
Scheduler, I feel it is apt to keep both as public methods. I prefer it to 'dispatcher.getEventHandler().handle..'.


2.
The Container needs to be added to the {{nodeUpdateQueue}} if the container is to be move
from ACQUIRED to RUNNING state (this is a state transition all containers should go thru).
Regarding the {{launchedContainers}}, Lets have both Opportunistic and Guaranteed containers
flow through a common code-path... and introduce specific behaviors if required in subsequent
patches as and when required.

3.
bq. In the OpportunisticContainerAllocatorAMService we are now calling the SchedulerNode::allocate,
and then we do not update the used resources but we do update some other counters, which leads
to inconsistencies.
Hmmm... I do see that the numContainers are not decremented correctly when release. Thanks...
but it looks like it would more likely just impact reporting  / UI, nothing functional (Will
update the patch). Can you specify which other counters specifically ? Like I mentioned in
the previous patch.. lets run all containers thru as much of the common code path before we
add new counters etc.

4.
bq. Maybe as part of a different JIRA, we should at some point extend the container.metrics
in the ContainerImpl to keep track of the scheduled/queued containers.
Yup.. +1 to that.

The rest of your comments make sense... will update patch.

bq. let's stress-test the code in a cluster before committing to make sure everything is good
It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic containers) and
I didn't hit any major issues. We can always open follow-up JIRAs for specific performance
related issues as and when we find it. Besides, stess-testing is not really a precondition
to committing a patch.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, YARN-4597.003.patch, YARN-4597.004.patch,
YARN-4597.005.patch, YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several
features could be more cleanly implemented if the NM included a separate stage for reserving
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message