crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-231) Support legacy Mappers and Reducers in Crunch pipelines
Date Sun, 30 Jun 2013 16:49:19 GMT


Josh Wills updated CRUNCH-231:

    Attachment: mapred.patch

This is a patch I put together to determine if such a thing was even possible, and it turns
out that w/some crazy reflection and javassist hacking, it is. We end up wrapping the instances
inside of DoFns, so they can be integrated as part of any Crunch pipeline (i.e., you can mix
and match existing Mappers and Reducers w/new DoFns and other library calls in the same pipeline
and underlying MapReduce execution.)

The patch supports both the old mapred.* APIs and the newer mapreduce.* APIs, with the mapred
APIs being a bit easier/cleaner to support. There's still more integration testing that needs
to be filled in, but I thought I would post this to see if anyone wanted to weigh in on this
before I took it much further.
> Support legacy Mappers and Reducers in Crunch pipelines
> -------------------------------------------------------
>                 Key: CRUNCH-231
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: mapred.patch
> I've had a few requests for Crunch to support existing Mappers and Reducers using the
underlying Java APIs as part of regular pipelines, so that users could evolve existing MapReduce
jobs into Crunch pipelines gradually, instead of being forced to rewrite everything all at
once in order to map it onto Crunch's model.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message