hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Chaining Multiple Map reduce jobs.
Date Thu, 09 Apr 2009 03:25:00 GMT
Chapter 8 of my book covers this in detail, the alpha chapter should be
available at the apress web site
Chain mapping rules!
http://www.apress.com/book/view/1430219424

On Wed, Apr 8, 2009 at 3:30 PM, Nathan Marz <nathan@rapleaf.com> wrote:

> You can also try decreasing the replication factor for the intermediate
> files between jobs. This will make writing those files faster.
>
>
> On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote:
>
>  Hi,
>> by far I am not an Hadoop expert but I think you can not start Map task
>> until the previous Reduce is finished. Saying this it means that you
>> probably have to store the Map output to the disk first (because a] it may
>> not fit into memory and b] you would risk data loss if the system
>> crashes).
>> As for the job chaining you can check JobControl class (
>>
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html
>> )<
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html
>> >
>>
>> Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702
>>
>> Regards,
>> Lukas
>>
>> On Wed, Apr 8, 2009 at 11:30 PM, asif md <asif.d2d3@gmail.com> wrote:
>>
>>  hi everyone,
>>>
>>> i have to chain multiple map reduce jobs < actually 2 to 4 jobs >, each
>>> of
>>> the jobs depends on the o/p of preceding job. In the reducer of each job
>>> I'm
>>> doing very little < just grouping by key from the maps>. I want to give
>>> the
>>> output of one MapReduce job to the next job without having to go to the
>>> disk. Does anyone have any ideas on how to do this?
>>>
>>> Thanx.
>>>
>>>
>>
>>
>> --
>> http://blog.lukas-vlcek.com/
>>
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message