tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-1522) Scheduling can result in out of order execution and slowdown of upstream work
Date Fri, 31 Oct 2014 06:34:34 GMT

     [ https://issues.apache.org/jira/browse/TEZ-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Seth updated TEZ-1522:
--------------------------------
    Attachment: TEZ-1522.1.wip.txt

Patch I was experimenting with earlier today which (temporarily) changes the NaturalOrderDAGScheduler
to make scheduling decisions based on when all sources of a vertex have been initialized.
This does solve the problem of scheduling the incorrect vertex (which is waiting for upstream
tasks) to some extent (except for VertexSchedulers which throttle their tasks like the ShuffleVertexManager);
also ends up ensuing that the source vertices are fully configured - since it will not schedule
tasks otherwise.

Got help from [~gopalv] earlier today to try it out on a cluster, and is doing what it's expected
to.

> Scheduling can result in out of order execution and slowdown of upstream work
> -----------------------------------------------------------------------------
>
>                 Key: TEZ-1522
>                 URL: https://issues.apache.org/jira/browse/TEZ-1522
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Critical
>              Labels: performance
>         Attachments: TEZ-1522.1.wip.txt, TEZ-1522.am.log.gz, task_runtime.svg
>
>
> M2             M7
>     \              /
> (sg) \            /
>        R3        / (b)
>         \       /
>      (b) \     /
>           \   /
>             M5
>             |
>             R6 
> Plz refer to the attachment (task runtime SVG). In this case, M5 got scheduled much earlier
than R3 (green color in the diagram) and retained lots of containers.
> R3 got less containers to work with. 
> Attaching the output from the status monitor when the job ran;  Map_5 has taken up almost
all of cluster resource, whereas Reducer_3 got fraction of the capacity.
> Map_2: 1/1      Map_5: 0(+373)/1000     Map_7: 1/1      Reducer_3: 0/8000       Reducer_6:
0/1
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 0/8000       Reducer_6:
0/1
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 0(+1)/8000   Reducer_6:
0/1
> ....
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 14(+7)/8000  Reducer_6:
0/1
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 63(+14)/8000 Reducer_6:
0/1
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 159(+22)/8000    
   Reducer_6: 0/1
> Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 308(+29)/8000    
   Reducer_6: 0/1
> ...
> Creating this JIRA as a placeholder for scheduler enhancement. One possibililty could
be to
> schedule lesser number of tasks in downstream vertices, based on the information available
for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message