tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-344) Support delayed scheduling for re-used containers
Date Fri, 16 Aug 2013 02:02:50 GMT

    [ https://issues.apache.org/jira/browse/TEZ-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741813#comment-13741813
] 

Bikas Saha commented on TEZ-344:
--------------------------------

If the releaseDelay is == 0 then we send (false, false) which means it will neither be released
nor queued. It will be lost.
{code}
+          // Don't attempt to delay containers if delay is 0.
           assignedContainers = assignAllocatedContainers(
-              Collections.singletonList(container), true);
+              Lists.newArrayList(container), true, false, reuseContainerDelay > 0);
{code}

2 instances of the following are probably incorrect now with reuse. unassignTask() and unassignContainer().
{code}
Resources.subtractFrom(allocatedResources, container.getResource());
{code}

We should probably add an assert in serviceStop that we have released every container we got.

Could we have released these containers when we passed over them in assignAllocatedContainer(ANY_ASSIGNER)
instead of re-looping them again?
{code}
+    private void addDelayedContainers(Iterable<Container> containers) {
+      // If there's no pending requests matching a specific container, release
+      // it instead of delaying it.
+      releaseContainersWithNoPendingRequests(containers);
{code}

Where are the remaining delayed containers being released?
{code}
+    public void shutdown() {
+      this.running =false;
+      this.interrupt();
+    }
{code}

It may have been simpler to loop every n millisec, collect all containers that are within
delay and try to assign them instead of the nextSched based queue that we are maintaining
now. Looking ahead a little with using delayedQueue as the main loop, that might be the direction
we eventually take.

Havent looked at the tests in details. I am sure you have stared at them closely. Overall
they look like they are testing the feature like a blackbox, which is good. Future refactorings
may be tested with existing tests.

If you feel any of the above are bugs then please fix them. Otherwise, I am ok with the patch.
We may make a few more such improvements and after that we will probably be in a position
to evaluate if/how we can re-factor the code to make it simpler.

                
> Support delayed scheduling for re-used containers
> -------------------------------------------------
>
>                 Key: TEZ-344
>                 URL: https://issues.apache.org/jira/browse/TEZ-344
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>              Labels: TEZ-0.2.0
>         Attachments: TEZ-344.2.txt, TEZ-344.3.txt, TEZ-344.wip.txt
>
>
> This, for now, is primarily to help with testing of Tez on clusters.
> Would have to go in with a warning since this could cause jobs to hang / run for a long
time.
> Longer term, this can be enhanced to set limits on how long to wait before assigning
non-local tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message