tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-344) Add an option to allow node-local only scheduling in the TaskScheduler
Date Wed, 07 Aug 2013 05:02:47 GMT

    [ https://issues.apache.org/jira/browse/TEZ-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731639#comment-13731639
] 

Siddharth Seth commented on TEZ-344:
------------------------------------

bq. No the RM will not assign 2 containers for the same resource request unless the racks
are also common. If the racks are common and it does it, then getMatchingRequests() will not
returning any match for container on host2 (assuming host1 container already matched T1 or
vice versa). In that case, the TaskScheduler will release the container back to the RM. That
I think is the intent of this jira too.
In case of rack local - the YARN scheduler allocates the wrong container, and we rely on getMathcingRequests
to reject this (Is this implemented in AMRMClient yet - from a quick look, that doesn't seem
like the behaviour). That seems fair.

bq. I wrote the TaskScheduler with the design intentions of working closely with YARN and
not working around YARN. If there have been issues in YARN I have fixed those so that YARN
works for everyone and the TaskScheduler is a model piece of code on how to interact with
correctly with YARN. I am wary of adding additional delay logic in the TaskScheduler. This
is like adding more sleep in a call stack to make things somehow work. Thats not the philosophy
of the TaskScheduler design.
If YARN works as it should in terms of scheduling, then the delay logic is more for the case
of container re-use where YARN is not involved.

Had an offline discussion about this with Bikas. This is what ends up happening.
When there's full cluster usage by the app, with re-use enabled, if a container becomes free
and there are no tasks matching the container - we can either assign the container to a non-local
task or release it (There's a config in re-use to avoid anything non-local). If the container
gets released - it'll be allocated back immediately by YARN based on rack / other locality
- and a non local task ends up running on this container (without the benefits of reuse).

Correct me if I'm wrong, but I believe the consensus was to skip the parameter and go directly
to some form of delay scheduling. i.e. keep completed containers around for a limited time
(equal to expected relaunch time / configurable / something based on average task runtime),
before assigning a non-local task to them. If during this period other containers become available,
to which local work can be assigned - that's what will happen. 

Another approach to handling this is to assign non local work to the available container,
but also assign the same task to a local container if and when it becomes available. This
will likely be better - but is much more involved in terms of getting speculation to work.
                
> Add an option to allow node-local only scheduling in the TaskScheduler
> ----------------------------------------------------------------------
>
>                 Key: TEZ-344
>                 URL: https://issues.apache.org/jira/browse/TEZ-344
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>              Labels: TEZ-0.2.0
>
> This, for now, is primarily to help with testing of Tez on clusters.
> Would have to go in with a warning since this could cause jobs to hang / run for a long
time.
> Longer term, this can be enhanced to set limits on how long to wait before assigning
non-local tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message