hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: How to Create an effective chained MapReduce program.
Date Tue, 06 Sep 2011 00:16:04 GMT
Why do you need to see the intermediate data as text?

What are the types of your key and values?

-Joey
On Sep 5, 2011 6:54 PM, "ilyal levin" <nipponilyal@gmail.com> wrote:
> o.k , so now i'm using SequenceFileInputFormat and
SequenceFileOutputFormat
> and it works fine but the output of the reducer is
> now a binary file (not txt) so i can't understand the data. how can i
solve
> this? i need the data (in txt form ) of the Intermediate stages in the
> chain.
>
> Thanks
>
> On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <nipponilyal@gmail.com> wrote:
>
>> Thanks for the help.
>>
>>
>> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <rogchen@ucdavis.edu> wrote:
>>
>>> The binary file will allow you to pass the output from the first reducer
>>> to the second mapper. For example, if you outputed Text, IntWritable
from
>>> the first one in SequenceFileOutputFormat, then you are able to retrieve
>>> Text, IntWritable input at the head of the second mapper. The idea of
>>> chaining is that you know what kind of output the first reducer is going
to
>>> give already, and that you want to perform some secondary operation on
it.
>>>
>>> One last thing on chaining jobs: it's often worth looking to see if you
>>> can consolidate all of your separate map and reduce tasks into a single
>>> map/reduce operation. There are many situations where it is more
intuitive
>>> to write a number of map/reduce operations and chain them together, but
more
>>> efficient to have just a single operation.
>>>
>>>
>>>
>>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponilyal@gmail.com
>wrote:
>>>
>>>> Thanks for the reply.
>>>> I tried it but it creates a binary file which i can not understand (i
>>>> need the result of the first job).
>>>> The other thing is how can i use this file in the next chained mapper?
>>>> i.e how can i retrieve the keys and the values in the map function?
>>>>
>>>>
>>>> Ilyal
>>>>
>>>>
>>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <joey@cloudera.com
>wrote:
>>>>
>>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>>>>
>>>>> -Joey
>>>>>
>>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponilyal@gmail.com>
>>>>> wrote:
>>>>> > Hi
>>>>> > I'm trying to write a chained mapreduce program. i'm doing so with
a
>>>>> simple
>>>>> > loop where in each iteration i
>>>>> > create a job ,execute it and every time the current job's output
is
>>>>> the next
>>>>> > job's input.
>>>>> > how can i configure the outputFormat of the current job and the
>>>>> inputFormat
>>>>> > of the next job so that
>>>>> > i will not use the TextInputFormat (TextOutputFormat), because if
i
do
>>>>> use
>>>>> > it, i need to parse the input file in the Map function?
>>>>> > i.e if possible i want the next job to "consider" the input file
as
>>>>> > <key,value> and not plain Text.
>>>>> > Thanks a lot.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Roger Chen
>>> UC Davis Genome Center
>>>
>>
>>

Mime
View raw message