hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6831) Miscellaneous refactoring changes of ContainScheduler
Date Tue, 18 Jul 2017 17:12:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091823#comment-16091823
] 

Arun Suresh commented on YARN-6831:
-----------------------------------

I was thinking about removing *maxOppQueueLength* which led me to think about the following.
In YARN-5972, we are trying to get the NM to pause an opportunistic container instead of killing
it. Both cgroup freezer and windows job objects implement freezing in the following way:
When a process is frozen, it's cpu share is reduced to 0 and its working set remains in memory
as long as there is no external memory pressure. If the OS can't keep the frozen process in
memory, it's memory is swapped out to disk and restored when the process is thawed. This implies
that the number of paused containers is limited to the total swap space on the NM. This should
be another local NM config, maybe something like *maxConsumedOpportunisticResources* which
places an additional limit on number of running opportunistic containers.

> Miscellaneous refactoring changes of ContainScheduler 
> ------------------------------------------------------
>
>                 Key: YARN-6831
>                 URL: https://issues.apache.org/jira/browse/YARN-6831
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> While reviewing YARN-6706, Karthik pointed out a few issues for improvment in ContainerScheduler
> *Make ResourceUtilizationTracker pluggable. That way, we could use a different tracker
when oversubscription is enabled.
> *ContainerScheduler
>   ##Why do we need maxOppQueueLength given queuingLimit?
>   ##Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic?
>   ##getOpportunisticContainersStatus method implementation feels awkward. How about capturing
the state in the field here, and have metrics etc. pull from here?
>   ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary
> *OpportunisticContainersStatus
>   ##Let us clearly differentiate between allocated, used and utilized. Maybe, we should
rename current Used methods to Allocated?
>   ##I prefer either full name Opportunistic (in method) or Opp (shortest name that makes
sense). Opport is neither short nor fully descriptive.
>   ##Have we considered folding ContainerQueuingLimit class into this?
> We decided to move the issues into this follow up jira to keep YARN-6706 moving forward
to unblock oversubscription work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message