hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Chen <rogc...@ucdavis.edu>
Subject Re: How to Create an effective chained MapReduce program.
Date Mon, 05 Sep 2011 19:50:02 GMT
The binary file will allow you to pass the output from the first reducer to
the second mapper. For example, if you outputed Text, IntWritable from the
first one in SequenceFileOutputFormat, then you are able to retrieve Text,
IntWritable input at the head of the second mapper. The idea of chaining is
that you know what kind of output the first reducer is going to give
already, and that you want to perform some secondary operation on it.

One last thing on chaining jobs: it's often worth looking to see if you can
consolidate all of your separate map and reduce tasks into a single
map/reduce operation. There are many situations where it is more intuitive
to write a number of map/reduce operations and chain them together, but more
efficient to have just a single operation.

On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponilyal@gmail.com> wrote:

> Thanks for the reply.
> I tried it but it creates a binary file which i can not understand (i need
> the result of the first job).
> The other thing is how can i use this file in the next chained mapper? i.e
> how can i retrieve the keys and the values in the map function?
> Ilyal
> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <joey@cloudera.com> wrote:
>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>> -Joey
>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponilyal@gmail.com>
>> wrote:
>> > Hi
>> > I'm trying to write a chained mapreduce program. i'm doing so with a
>> simple
>> > loop where in each iteration i
>> > create a job ,execute it and every time the current job's output is the
>> next
>> > job's input.
>> > how can i configure the outputFormat of the current job and the
>> inputFormat
>> > of the next job so that
>> > i will not use the TextInputFormat (TextOutputFormat), because if i do
>> use
>> > it, i need to parse the input file in the Map function?
>> > i.e if possible i want the next job to "consider" the input file as
>> > <key,value> and not plain Text.
>> > Thanks a lot.
>> >
>> >
>> >
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434

Roger Chen
UC Davis Genome Center

View raw message