pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1734) Pig needs a more efficient DAG execution
Date Wed, 17 Nov 2010 21:25:19 GMT

    [ https://issues.apache.org/jira/browse/PIG-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933171#action_12933171
] 

Jeff Hammerbacher commented on PIG-1734:
----------------------------------------

bq. However note that the use of the workflow execution engine should not be enforced but
should be optional.

Certainly agree that we shouldn't disrupt existing users.

> Pig needs a more efficient DAG execution
> ----------------------------------------
>
>                 Key: PIG-1734
>                 URL: https://issues.apache.org/jira/browse/PIG-1734
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> The current code uses Hadoop's Job control to execute one stage at a time. The first
stage includes all jobs with no dependencies, the second stage jobs that depend only on jobs
completed in the first stage, the third stage contains the jobs that depend on jobs from stage
1 and 2, etc.
> The problem with this simplistic approach is that each next stages only starts when the
previous stage is over which means means that some branches of the DAG are unnecessarily blocked.
> We would need to do our own DAG management to solve this issue which would be a pretty
significant undertaking. Something we should look at in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message