tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-394) Better scheduling for uneven DAGs
Date Fri, 04 Jan 2019 21:23:00 GMT

     [ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated TEZ-394:
    Attachment: TEZ-394.004.patch

> Better scheduling for uneven DAGs
> ---------------------------------
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch, TEZ-394.004.patch
>   Consider a series of joins or group by on dataset A with few datasets that takes 10
hours followed by a final join with a dataset X. The vertex that loads dataset X will be one
of the top vertexes and initialized early even though its output is not consumed till the
end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases where the
nodes which executed the MapTask might have gone down when the final join happens. 

This message was sent by Atlassian JIRA

View raw message