crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ioan Marius Curelariu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-390) Planner is not adding dependencies between jobs when planning is done in more than one stage.
Date Wed, 07 May 2014 07:41:39 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ioan Marius Curelariu updated CRUNCH-390:
-----------------------------------------

    Attachment: 0001-Patched-the-MSCRPlanner-to-correctly-add-dependencie.patch

Added a patch that fixed the dependencies between jobs when planning is done in more than
one stage.
The patch also adds and integration test that demonstrates the issue and its fixing.
I've successfully applied it back to an freshly cloned repository.
Can you please review my change?
Thank you.

> Planner is not adding dependencies between jobs when planning is done in more than one
stage.
> ---------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-390
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-390
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.2
>            Reporter: Ioan Marius Curelariu
>            Assignee: Josh Wills
>         Attachments: 0001-Patched-the-MSCRPlanner-to-correctly-add-dependencie.patch
>
>
> The planner splits does the planning in multiple stages when it finds job dependencies
on ReadableData. One example of this case is when using the BloomFilterJoinStrategy.
> While the generated plan dot file looks good, the planner actually does not add dependencies
between jobs that are created in different planning stages.
> I have a pipeline that reads 3 input sources. It joins 2 of them using a bloom filter
join strategy. Later on, it joins this with the output of a job coming from the third source
path.
> In the case the jobs on the branch using the bloom filter finish before the one reading
the third source, the executor attempts to start the 4-th job that is supposed to join everything
before the 3-rd one finish, resulting in a input Path not found exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message