tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs
Date Fri, 04 Jan 2019 21:24:00 GMT

    [ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734580#comment-16734580

Jason Lowe commented on TEZ-394:

Attaching a new patch that makes this behavior configurable and disabled by default.  This
avoids the bad preemption behavior that Gopal encountered when running with the default YARN
task scheduler but allows users to enable it in conjuction with a DAG-aware task scheduler
like DagAwareYarnTaskScheduler.

> Better scheduling for uneven DAGs
> ---------------------------------
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch, TEZ-394.004.patch
>   Consider a series of joins or group by on dataset A with few datasets that takes 10
hours followed by a final join with a dataset X. The vertex that loads dataset X will be one
of the top vertexes and initialized early even though its output is not consumed till the
end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases where the
nodes which executed the MapTask might have gone down when the final join happens. 

This message was sent by Atlassian JIRA

View raw message