hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilyal levin <nipponil...@gmail.com>
Subject Re: How to Create an effective chained MapReduce program.
Date Tue, 06 Sep 2011 07:16:55 GMT
I need it because the intermediate data is also part of the solution to the
problem my algorithm solve.
i somehow need to log this information.
The key is Text and the value is ArrayWritable (TextArrayWritable).



On Tue, Sep 6, 2011 at 8:57 AM, Niels Basjes <niels@basj.es> wrote:

> Hi,
>
> In the past i've had the same situation where I needed the data for
> debugging. Back then I chose to create a second job with simply
> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
> TextOutputFormat.
>
> In my situation that worked great for my purpose.
>
> --
> Met vriendelijke groet,
> Niels Basjes
>
> Op 6 sep. 2011 01:54 schreef "ilyal levin" <nipponilyal@gmail.com> het
> volgende:
>
> >
> > o.k , so now i'm using SequenceFileInputFormat
> and SequenceFileOutputFormat and it works fine but the output of the reducer
> is
> > now a binary file (not txt) so i can't understand the data. how can i
> solve this? i need the data (in txt form ) of the Intermediate stages in the
> chain.
> >
> > Thanks
> >
> >
> > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <nipponilyal@gmail.com>
> wrote:
> >>
> >> Thanks for the help.
> >>
> >>
> >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <rogchen@ucdavis.edu>
> wrote:
> >>>
> >>> The binary file will allow you to pass the output from the first
> reducer to the second mapper. For example, if you outputed Text, IntWritable
> from the first one in SequenceFileOutputFormat, then you are able to
> retrieve Text, IntWritable input at the head of the second mapper. The idea
> of chaining is that you know what kind of output the first reducer is going
> to give already, and that you want to perform some secondary operation on
> it.
> >>>
> >>> One last thing on chaining jobs: it's often worth looking to see if you
> can consolidate all of your separate map and reduce tasks into a single
> map/reduce operation. There are many situations where it is more intuitive
> to write a number of map/reduce operations and chain them together, but more
> efficient to have just a single operation.
> >>>
> >>>
> >>>
> >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponilyal@gmail.com>
> wrote:
> >>>>
> >>>> Thanks for the reply.
> >>>> I tried it but it creates a binary file which i can not understand (i
> need the result of the first job).
> >>>> The other thing is how can i use this file in the next chained mapper?
> i.e how can i retrieve the keys and the values in the map function?
> >>>>
> >>>>
> >>>> Ilyal
> >>>>
> >>>>
> >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <joey@cloudera.com>
> wrote:
> >>>>>
> >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
> >>>>>
> >>>>> -Joey
> >>>>>
> >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponilyal@gmail.com>
> wrote:
> >>>>> > Hi
> >>>>> > I'm trying to write a chained mapreduce program. i'm doing
so with
> a simple
> >>>>> > loop where in each iteration i
> >>>>> > create a job ,execute it and every time the current job's output
is
> the next
> >>>>> > job's input.
> >>>>> > how can i configure the outputFormat of the current job and
the
> inputFormat
> >>>>> > of the next job so that
> >>>>> > i will not use the TextInputFormat (TextOutputFormat), because
if i
> do use
> >>>>> > it, i need to parse the input file in the Map function?
> >>>>> > i.e if possible i want the next job to "consider" the input
file as
> >>>>> > <key,value> and not plain Text.
> >>>>> > Thanks a lot.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Joseph Echeverria
> >>>>> Cloudera, Inc.
> >>>>> 443.305.9434
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Roger Chen
> >>> UC Davis Genome Center
> >>
> >>
> >
>
>

Mime
View raw message