hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MONTMORY Alain <alain.montm...@thalesgroup.com>
Subject RE: MR 0.20.2 job chaining
Date Tue, 26 Jul 2011 07:29:58 GMT
Hello,

You can also use Cascading API (http://www.cascading.org/) which greatly simplify the Job
chainning.

In Thales we try both MR native and Cacading approach and we obtain very good results (productivity
and performance) using cascading...

regards

[@@THALES GROUP RESTRICTED@@]

-----Message d'origine-----
De : Harsh J [mailto:harsh@cloudera.com] 
Envoyé : lundi 25 juillet 2011 23:22
À : mapreduce-user@hadoop.apache.org; Ross
Objet : Re: MR 0.20.2 job chaining

What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).

If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.

P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).

On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rjnordee@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are dependent on
the previous map processes.  I understand that a job scheduler exists to control such processes.
 Would anyone be able to give some sample code of a working implementation of this in hadoop
0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>



-- 
Harsh J

Mime
View raw message