hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6831) Miscellaneous refactoring changes of ContainScheduler
Date Mon, 17 Jul 2017 21:36:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090609#comment-16090609

Arun Suresh commented on YARN-6831:

Thanks for raising this [~haibochen] / [~kasha]. Some thoughts:

bq. Why do we need maxOppQueueLength given queuingLimit?
So, maxOppQueueLength is more like an *active* limit. The CS (ContainerScheduler) will not
admit any more containers than that value. While the queuingLimit is more *reactive* and dynamically
calculated by the RM and passed down to the NM in a HB response. The RM constantly calculates
the mean/median of the queueLengths on all nodes and it tells the NM to shed containers from
the queue if it is too high. I agree that the *maxOppQueueLength* can probably be removed
though. But given your observation in YARN-6706 that test cases depends on this, my opinion
is that we will keep it, and put a very high value by default - and mark it as VisibileForTesting

bq. Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic
Hmm… I was actually thinking of removing the *runningContainers* itself. It was introduced
to keep track of all running containers (containers whose state is running) AND those that
have been scheduled but not yet running. I think it may be better to encapsulate that as a
proper container state, something like *SCHEDULED_TO_RUN* via a proper transition.
Adding more data structures might be problematic later on, since we can hit minor race conditions
when transferring containers from runningGuaranteed to running Opportunistic (during promotion)
and vice-versa (during demotion) if we are not careful about synchronization etc. Also, given
the fact that a NM will not run more than say a couple of 100 containers, it might be better
to just iterate over all the containers when the scheduler needs to make a decision.
Another problem with keeping a separate map is during NM recovery, we have to populate this
specifically. we don’t do that for running containers now either – but I was think if
we removed the *runningContainers* map, we wont have to (we already have a state called *QUEUED*
in the NMStateStore which can be used to set the correct state in the recovered container)

bq. getOpportunisticContainersStatus method implementation feels awkward..
Kind of agree with you there, don’t recall exactly why we did it like that… think it was
to not have to create a new instance of the status at every heart beat. 

bq. Have we considered folding ContainerQueuingLimit class into this
My first instinct is to keep it separate. Don’t think we should mix the Queuing aspect of
the Container Scheduler with the ExecutionType aspect. Also, one is part of the NM heartbeat
request and the other comes back as response.

> Miscellaneous refactoring changes of ContainScheduler 
> ------------------------------------------------------
>                 Key: YARN-6831
>                 URL: https://issues.apache.org/jira/browse/YARN-6831
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
> While reviewing YARN-6706, Karthik pointed out a few issues for improvment in ContainerScheduler
> *Make ResourceUtilizationTracker pluggable. That way, we could use a different tracker
when oversubscription is enabled.
> *ContainerScheduler
>   ##Why do we need maxOppQueueLength given queuingLimit?
>   ##Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic?
>   ##getOpportunisticContainersStatus method implementation feels awkward. How about capturing
the state in the field here, and have metrics etc. pull from here?
>   ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary
> *OpportunisticContainersStatus
>   ##Let us clearly differentiate between allocated, used and utilized. Maybe, we should
rename current Used methods to Allocated?
>   ##I prefer either full name Opportunistic (in method) or Opp (shortest name that makes
sense). Opport is neither short nor fully descriptive.
>   ##Have we considered folding ContainerQueuingLimit class into this?
> We decided to move the issues into this follow up jira to keep YARN-6706 moving forward
to unblock oversubscription work.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message