hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MONTMORY Alain <alain.montm...@thalesgroup.com>
Subject RE: MR 0.20.2 job chaining
Date Tue, 26 Jul 2011 07:29:58 GMT

You can also use Cascading API (http://www.cascading.org/) which greatly simplify the Job

In Thales we try both MR native and Cacading approach and we obtain very good results (productivity
and performance) using cascading...



-----Message d'origine-----
De : Harsh J [mailto:harsh@cloudera.com] 
Envoyé : lundi 25 juillet 2011 23:22
À : mapreduce-user@hadoop.apache.org; Ross
Objet : Re: MR 0.20.2 job chaining

What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban

If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
Hadoop offers to let you add dependent jobs and create uncomplicated

P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper

On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rjnordee@mtu.edu> wrote:
> Hello all,
> I am trying to write a MR program where the output from the mappers are dependent on
the previous map processes.  I understand that a job scheduler exists to control such processes.
 Would anyone be able to give some sample code of a working implementation of this in hadoop
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee

Harsh J

View raw message