tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-344) Support delayed scheduling for re-used containers
Date Fri, 16 Aug 2013 07:54:47 GMT

    [ https://issues.apache.org/jira/browse/TEZ-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742012#comment-13742012

Siddharth Seth commented on TEZ-344:

bq. If the releaseDelay is == 0 then we send (false, false) which means it will neither be
released nor queued. It will be lost.
Good catch. That's the second bug you've found in the same method call. Let me see if it can
be simplified a bit.

bq. instances of the following are probably incorrect now with reuse. unassignTask() and unassignContainer().
I haven't verified this, but I believe these stats should be better after this patch (broken
by the re-use patch). Will check again. Was allocatedResources supposed to capture assigned
containers only, or all allocated containers ?

bq. Could we have released these containers when we passed over them in assignAllocatedContainer(ANY_ASSIGNER)
instead of re-looping them again?
They may not hit that code path depending on the re-use fallback configuration.

bq. Where are the remaining delayed containers being released?
The thread released all it's containers when it falls out of it's main loop - via the interrupt.

bq.  We may make a few more such improvements and after that we will probably be in a position
to evaluate if/how we can re-factor the code to make it simpler.
Agree, there's going to be a couple more changes to this code at least - for not assigning
unusable containers to a vertex-task etc. Once all of that is in place, we can look at making
it simpler - including using the DelayedContainerManager for all allocations.
> Support delayed scheduling for re-used containers
> -------------------------------------------------
>                 Key: TEZ-344
>                 URL: https://issues.apache.org/jira/browse/TEZ-344
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>              Labels: TEZ-0.2.0
>         Attachments: TEZ-344.2.txt, TEZ-344.3.txt, TEZ-344.wip.txt
> This, for now, is primarily to help with testing of Tez on clusters.
> Would have to go in with a warning since this could cause jobs to hang / run for a long
> Longer term, this can be enhanced to set limits on how long to wait before assigning
non-local tasks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message