crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narlin M <hpn...@gmail.com>
Subject Re: Crunch DoFn vs Mapper/reducer
Date Thu, 15 Aug 2013 13:54:50 GMT
Thanks for the reply, Josh. I understand its function a bit better now.


On Wed, Aug 14, 2013 at 5:50 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Narlin,
>
> DoFns are similar to the Mapper and Reducer classes that you would write
> in classic MapReduce jobs-- they don't spawn MapReduce jobs themselves. The
> Crunch planner will analyze the overall DAG of DoFns, groupByKeys, unions,
> and combineValues operations and compile the DAG into one or more MapReduce
> jobs, where each of the DoFns will be assigned to one of the Mappers or
> Reducers in those jobs. Crunch has its own Mapper and Reducer
> implementations (named CrunchMapper and CrunchReducer, naturally) that are
> responsible for executing the DoFns that are assigned to each phase of the
> job.
>
> In general, you should not need to use mapper and reducer classes when you
> use Crunch, although if you have legacy Mapper and Reducer classes that you
> would like to use in conjunction with the DoFns in a Crunch pipeline, there
> is a collection of methods in org.apache.crunch.lib.MapReduce in Crunch
> 0.7.0 that will wrap a given Mapper or Reducer class inside of a DoFn.
>
> Hope that helps.
>
> Best,
> Josh
>
>
>
> On Wed, Aug 14, 2013 at 12:59 PM, Narlin M <hpnole@gmail.com> wrote:
>
>> I have just recently started using Crunch, having been recommended to use
>> it instead of writing plain map reduce jobs. As I was going through the
>> crunch documentation, some questions came to my mind. Am I correct in
>> saying that the DoFn family of functions will internally spawn map-reduce
>> jobs, so there is no need to write separate mapper or reducer classes? If
>> so, I agree that this will abstract some of the lower level details from
>> the programmer, but at the same time, does it not lower the programmer's
>> control over the processing logic?
>>
>> Also, will there be situations when separate mapper / reducer classes
>> will be required in addition to the DoFn functions?
>>
>> Thanks.
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message