hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Vlček <lukas.vl...@gmail.com>
Subject Re: Chaining Multiple Map reduce jobs.
Date Wed, 08 Apr 2009 22:14:23 GMT
Hi,
by far I am not an Hadoop expert but I think you can not start Map task
until the previous Reduce is finished. Saying this it means that you
probably have to store the Map output to the disk first (because a] it may
not fit into memory and b] you would risk data loss if the system crashes).
As for the job chaining you can check JobControl class (
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html)<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html>

Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702

Regards,
Lukas

On Wed, Apr 8, 2009 at 11:30 PM, asif md <asif.d2d3@gmail.com> wrote:

> hi everyone,
>
> i have to chain multiple map reduce jobs < actually 2 to 4 jobs >, each of
> the jobs depends on the o/p of preceding job. In the reducer of each job
> I'm
> doing very little < just grouping by key from the maps>. I want to give the
> output of one MapReduce job to the next job without having to go to the
> disk. Does anyone have any ideas on how to do this?
>
> Thanx.
>



-- 
http://blog.lukas-vlcek.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message