Mailing-List: contact user-help@crunch.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@crunch.apache.org
Received-SPF: pass (athena.apache.org: domain of hpnole@gmail.com designates
 209.85.160.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAH29n6P6=ZEHr9yL2emjbtM_ZGb94kbBMCoqVJZPPPOT2VjzMg@mail.gmail.com>
References: 
 <CABoksgaKvKcprFeZYz25GeAe12o11oiDLRpDBBXXDaWW70KeZQ@mail.gmail.com>
	<CAH29n6P6=ZEHr9yL2emjbtM_ZGb94kbBMCoqVJZPPPOT2VjzMg@mail.gmail.com>
Date: Thu, 15 Aug 2013 08:54:50 -0500
Message-ID: 
 <CABoksgYdwVWMrAb0R5PrQiwYpmoG4xn2=QhYwHfTB+Mcedd84w@mail.gmail.com>
Subject: Re: Crunch DoFn vs Mapper/reducer
From: Narlin M <hpnole@gmail.com>
To: Crunch users <user@crunch.apache.org>
Content-Type: multipart/alternative; boundary=047d7b33d58e5549f004e3fcd16a

--047d7b33d58e5549f004e3fcd16a
Content-Type: text/plain; charset=ISO-8859-1

Thanks for the reply, Josh. I understand its function a bit better now.


On Wed, Aug 14, 2013 at 5:50 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Narlin,
>
> DoFns are similar to the Mapper and Reducer classes that you would write
> in classic MapReduce jobs-- they don't spawn MapReduce jobs themselves. The
> Crunch planner will analyze the overall DAG of DoFns, groupByKeys, unions,
> and combineValues operations and compile the DAG into one or more MapReduce
> jobs, where each of the DoFns will be assigned to one of the Mappers or
> Reducers in those jobs. Crunch has its own Mapper and Reducer
> implementations (named CrunchMapper and CrunchReducer, naturally) that are
> responsible for executing the DoFns that are assigned to each phase of the
> job.
>
> In general, you should not need to use mapper and reducer classes when you
> use Crunch, although if you have legacy Mapper and Reducer classes that you
> would like to use in conjunction with the DoFns in a Crunch pipeline, there
> is a collection of methods in org.apache.crunch.lib.MapReduce in Crunch
> 0.7.0 that will wrap a given Mapper or Reducer class inside of a DoFn.
>
> Hope that helps.
>
> Best,
> Josh
>
>
>
> On Wed, Aug 14, 2013 at 12:59 PM, Narlin M <hpnole@gmail.com> wrote:
>
>> I have just recently started using Crunch, having been recommended to use
>> it instead of writing plain map reduce jobs. As I was going through the
>> crunch documentation, some questions came to my mind. Am I correct in
>> saying that the DoFn family of functions will internally spawn map-reduce
>> jobs, so there is no need to write separate mapper or reducer classes? If
>> so, I agree that this will abstract some of the lower level details from
>> the programmer, but at the same time, does it not lower the programmer's
>> control over the processing logic?
>>
>> Also, will there be situations when separate mapper / reducer classes
>> will be required in addition to the DoFn functions?
>>
>> Thanks.
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

--047d7b33d58e5549f004e3fcd16a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:georgia,=
serif">Thanks for the reply, Josh. I understand its function a bit better n=
ow.<br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_q=
uote">On Wed, Aug 14, 2013 at 5:50 PM, Josh Wills <span dir=3D"ltr">&lt;<a =
href=3D"mailto:jwills@cloudera.com" target=3D"_blank">jwills@cloudera.com</=
a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hey Narlin,<div><br></div><=
div>DoFns are similar to the Mapper and Reducer classes that you would writ=
e in classic MapReduce jobs-- they don&#39;t spawn MapReduce jobs themselve=
s. The Crunch planner will analyze the overall DAG of DoFns, groupByKeys, u=
nions, and combineValues operations and compile the DAG into one or more Ma=
pReduce jobs, where each of the DoFns will be assigned to one of the Mapper=
s or Reducers in those jobs. Crunch has its own Mapper and Reducer implemen=
tations (named CrunchMapper and CrunchReducer, naturally) that are responsi=
ble for executing the DoFns that are assigned to each phase of the job.</di=
v>


<div><br></div><div>In general, you should not need to use mapper and reduc=
er classes when you use Crunch, although if you have legacy Mapper and Redu=
cer classes that you would like to use in conjunction with the DoFns in a C=
runch pipeline, there is a collection of methods in org.apache.crunch.lib.M=
apReduce in Crunch 0.7.0 that will wrap a given Mapper or Reducer class ins=
ide of a DoFn.</div>


<div><br></div><div>Hope that helps.</div><div><br>Best,<br>Josh</div><div>=
<br></div></div><div class=3D"gmail_extra"><div><div class=3D"h5"><br><br><=
div class=3D"gmail_quote">On Wed, Aug 14, 2013 at 12:59 PM, Narlin M <span =
dir=3D"ltr">&lt;<a href=3D"mailto:hpnole@gmail.com" target=3D"_blank">hpnol=
e@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default=
" style=3D"font-family:georgia,serif">I have just recently started using Cr=
unch, having been recommended to use it instead of writing plain map reduce=
 jobs. As I was going through the crunch documentation, some questions came=
 to my mind. Am I correct in saying that the DoFn family of functions will =
internally spawn map-reduce jobs, so there is no need to write separate map=
per or reducer classes? If so, I agree that this will abstract some of the =
lower level details from the programmer, but at the same time, does it not =
lower the programmer&#39;s control over the processing logic?<br>


<br></div><div class=3D"gmail_default" style=3D"font-family:georgia,serif">=
Also, will there be situations when separate mapper / reducer classes will =
be required in addition to the DoFn functions?<br></div><div class=3D"gmail=
_default" style=3D"font-family:georgia,serif">


<br></div><div class=3D"gmail_default" style=3D"font-family:georgia,serif">=
Thanks.</div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
lass=3D"HOEnZb"><font color=3D"#888888">-- <br><div>Director of Data Scienc=
e</div><div><a href=3D"http://www.cloudera.com" target=3D"_blank">Cloudera<=
/a></div>
<div>Twitter: <a href=3D"http://twitter.com/josh_wills" target=3D"_blank">@=
josh_wills</a></div>


</font></span></div>
</blockquote></div><br></div>

--047d7b33d58e5549f004e3fcd16a--