hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle
Date Fri, 21 Oct 2016 04:31:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593945#comment-15593945
] 

Arun Suresh edited comment on YARN-4597 at 10/21/16 4:31 AM:
-------------------------------------------------------------

[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling guaranteed containers
when the oppotunistic-queue is enabled. Previously, when launching container, NM will not
check for current vmem usage, and cpu usage. It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the limit.
Yup, we do a *hasResources* check only at the start of a container and when a container is
killed. We assumed that resources requested by a container is constant, essentially we considered
only actual *allocated* resources which we assume will not varying during the lifetime of
the container... which implies, there is no point in checking this at any other time other
than start and kill of containers.
But like you stated, if we consider container resource *utilization*, based on the work [~kasha]
is doing in YARN-1011, then yes, we should have a timer thread that periodically checks the
vmem and cpu usage and starts (and kills) containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility methods, not
sure how we will make this pluggable later.
Following on my point above, the idea was to have a {{ResourceUtilizationManager}} that can
provide a different value of {{getCurrentUtilization}}, {{addResource}} and {{subtractResource}}
which is used by the ContainerScheduler to calculate the resources to free up. For instance,
the current default one only takes into account actual resource *allocated* to containers...
 for YARN-1011, we might replace that with the resource *utilized* by running containers,
and provide a different value for {{getCurrentUtilization}}. The timer thread I mentioned
in the previous point, which can be apart of this new ResourceUtilizationManager, can send
events to the scheduler to re-process queued containers when utilization has changed.

bq. The logic to select opportunisitic container: we may kill more opportunistic containers
than required. e.g...
Good catch, in the {{resourcesToFreeUp}}, I needed to decrement any already-marked-for-kill
opportunistic container. It was there earlier, Had removed it when I was testing something,
but forgot to put it back :)

bq. we don't need to synchronize on the currentUtilization object? I don't see any other place
it's synchronized
Yup, It isnt required. Varun did point out the same.. I thought I had fixed it, think I might
have missed 'git add'ing the change

w.r.t Adding the new transitions, I was seeing some error messages in some testcases. Will
rerun and see if they are required… but in anycase, having them there should be harmless
right?
 
The rest of your comments makes sense.. will address them shortly.



was (Author: asuresh):
[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling guaranteed containers
when the oppotunistic-queue is enabled.
Previously, when launching container, NM will not check for current vmem usage, and cpu usage.
It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the limit.
Yup, we do a *hasResources* check only at the start of a container and when a container is
killed. We assumed that resources requested by a container is constant, essentially we considered
only actual *allocated* resources which we assume will not varying during the lifetime of
the container... which implies, there is no point in checking this at any other time other
than start and kill of containers.
But like you stated, if we consider container resource *utilization*, based on the work [~kasha]
is doing in YARN-1011, then yes, we should have a timer thread that periodically checks the
vmem and cpu usage and starts (and kills) containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility methods, not
sure how we will make this pluggable later.
Following on my point above, the idea was to have a {{ResourceUtilizationManager}} that can
provide a different value of {{getCurrentUtilization}}, {{addResource}} and {{subtractResource}}
which is used by the ContainerScheduler to calculate the resources to free up. For instance,
the current default one only takes into account actual resource *allocated* to containers...
 for YARN-1011, we might replace that with the resource *utilized* by running containers,
and provide a different value for {{getCurrentUtilization}}. The timer thread I mentioned
in the previous point, which can be apart of this new ResourceUtilizationManager, can send
events to the scheduler to re-process queued containers when utilization has changed.

bq. The logic to select opportunisitic container: we may kill more opportunistic containers
than required. e.g...
Good catch, in the {{resourcesToFreeUp}}, I needed to decrement any already-marked-for-kill
opportunistic container. It was there earlier, Had removed it when I was testing something,
but forgot to put it back :)

bq. we don't need to synchronize on the currentUtilization object? I don't see any other place
it's synchronized
Yup, It isnt required. Varun did point out the same.. I thought I had fixed it, think I might
have missed 'git add'ing the change

w.r.t Adding the new transitions, I was seeing some error messages in some testcases. Will
rerun and see if they are required… but in anycase, having them there should be harmless
right?
 
The rest of your comments makes sense.. will address them shortly.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, YARN-4597.003.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several
features could be more cleanly implemented if the NM included a separate stage for reserving
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message