hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jingkei Ly <jly.l...@googlemail.com>
Subject Re: how to write this MapReduce
Date Mon, 26 Oct 2009 14:48:06 GMT
Assuming your input files are sorted, you should be able to use the map-side
join framework to do the job you describe (effectively an outer join) while
avoiding going through the Reduce phase.

There are instructions on how to use it here:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/join/package-summary.html

2009/10/26 Anty <anty.rao@gmail.com>

> Does MultipleInputs meet this situation?
> Does any one have any idea about this?
>
>
> On Mon, Oct 26, 2009 at 7:44 PM, Anty <anty.rao@gmail.com> wrote:
>
>> Hi:
>> all
>> I have a such use case:i have three files,each file is key-value pairs,
>> file1:                         file2:                         file3:
>> key1-value1A           key1-value1B           key1-value1C
>> key2-value2A           key2-value2B           key2-value2C
>> key3-value3A           kye3-value3B           kye3-value3C
>>    .....                                  ......
>> .....
>> now ,i want to write a MR job to generate a file,
>> file4:
>> key1-(value1A,value1B,value1C)
>> key2-(value2A,value2B,value2C)
>> key3-(value3A,value3B,value3C)
>> ..........
>> Any suggestion will be appreciated.
>> --
>> Best Regards
>> Anty Rao
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Mime
View raw message