crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CRUNCH-241) Write side outputs from the Mapper stage of a MapReduce job
Date Mon, 22 Jul 2013 22:20:49 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills resolved CRUNCH-241.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7.0

Committed this to master.
                
> Write side outputs from the Mapper stage of a MapReduce job
> -----------------------------------------------------------
>
>                 Key: CRUNCH-241
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-241
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.7.0
>
>         Attachments: CRUNCH-241.patch
>
>
> Right now, Crunch always writes output files from the "last" stage of whatever kind of
job it runs: either the reduce-side of a MapReduce job, or the map-side of a map-only job.
This often leads to situations where we have to re-process the same input twice, once for
the map-side outputs and again for the reduce-side outputs.
> This change adds the ability for Crunch to write side outputs from the mapper phase of
a MapReduce job (i.e., we can write output Targets from both the map side and the reduce side.)
This should help lots of pipelines that implement these types of writes execute much faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message