tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-344) Support delayed scheduling for re-used containers
Date Fri, 16 Aug 2013 02:02:50 GMT

    [ https://issues.apache.org/jira/browse/TEZ-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741813#comment-13741813

Bikas Saha commented on TEZ-344:

If the releaseDelay is == 0 then we send (false, false) which means it will neither be released
nor queued. It will be lost.
+          // Don't attempt to delay containers if delay is 0.
           assignedContainers = assignAllocatedContainers(
-              Collections.singletonList(container), true);
+              Lists.newArrayList(container), true, false, reuseContainerDelay > 0);

2 instances of the following are probably incorrect now with reuse. unassignTask() and unassignContainer().
Resources.subtractFrom(allocatedResources, container.getResource());

We should probably add an assert in serviceStop that we have released every container we got.

Could we have released these containers when we passed over them in assignAllocatedContainer(ANY_ASSIGNER)
instead of re-looping them again?
+    private void addDelayedContainers(Iterable<Container> containers) {
+      // If there's no pending requests matching a specific container, release
+      // it instead of delaying it.
+      releaseContainersWithNoPendingRequests(containers);

Where are the remaining delayed containers being released?
+    public void shutdown() {
+      this.running =false;
+      this.interrupt();
+    }

It may have been simpler to loop every n millisec, collect all containers that are within
delay and try to assign them instead of the nextSched based queue that we are maintaining
now. Looking ahead a little with using delayedQueue as the main loop, that might be the direction
we eventually take.

Havent looked at the tests in details. I am sure you have stared at them closely. Overall
they look like they are testing the feature like a blackbox, which is good. Future refactorings
may be tested with existing tests.

If you feel any of the above are bugs then please fix them. Otherwise, I am ok with the patch.
We may make a few more such improvements and after that we will probably be in a position
to evaluate if/how we can re-factor the code to make it simpler.

> Support delayed scheduling for re-used containers
> -------------------------------------------------
>                 Key: TEZ-344
>                 URL: https://issues.apache.org/jira/browse/TEZ-344
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>              Labels: TEZ-0.2.0
>         Attachments: TEZ-344.2.txt, TEZ-344.3.txt, TEZ-344.wip.txt
> This, for now, is primarily to help with testing of Tez on clusters.
> Would have to go in with a warning since this could cause jobs to hang / run for a long
> Longer term, this can be enhanced to set limits on how long to wait before assigning
non-local tasks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message