crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-237) Improper job dependencies for certain types of long pipelines
Date Tue, 16 Jul 2013 00:30:49 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills updated CRUNCH-237:
------------------------------

    Attachment: CRUNCH-237.patch

The patch. I will submit it in the next few minutes.
                
> Improper job dependencies for certain types of long pipelines
> -------------------------------------------------------------
>
>                 Key: CRUNCH-237
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-237
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.7.0
>
>         Attachments: CRUNCH-237.patch
>
>
> The Crunch planner analyzes the dependencies between different phases of a MapReduce
pipeline and uses those dependencies to ensure that the MapReduce jobs in the pipeline are
executed in the correct sequence. For certain kinds of long pipelines, it's possible for the
planner to miss a necessary dependency as follows:
> Pipeline spec: [Input] -> GBK -> [Out1] -> (GBK) -> (Out2) -> GBK ->
[Out3]
> This pipeline has two explicit outputs (Out1 and Out3) and one implicit output (Out2).
Additionally, assume that there is a map-side join between Out1 that happens in the map stage
of the job that creates Out2. For this pipeline, the planner will mark a dependency between
the job that creates Out1 and the job that creates Out3, but NOT between the job that creates
Out1 and the job that creates Out2. This makes it possible for the Out2 job to run before
Out1 is created, causing a failure.
> The easiest way to fix this I could see was to add a step to the dependency chain such
that every job that is created in a later stage of pipeline creation depends on all of the
jobs in the earlier stages having been run, which is what the attached patch and example integration
test demonstrate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message