hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: example of multiple inputs?
Date Thu, 30 Dec 2010 17:19:46 GMT
You only need two InputFormats, one for SequenceFile
(SequenceFileInputFormat or its subsets for Binary and Text, or your
own extension), the other for Text (TextInputFormat, perhaps). Since
both your Mappers are going to act on the same type of keys and
values, you need only one Mapper implementation doing what you want it
to do. Look at MultipleInputs.addInputPath() in the API to then add it
to your job. [API link:
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleInputs.html]

The mapper can simply do an operation and collect to its default
output collector and be done with it. The reducer class will get
grouped keys from both sources. It is as simple as that.

On Thu, Dec 30, 2010 at 10:04 PM, Yin Lou <yin.lou.07@gmail.com> wrote:
> Hi,
>
> I have two data sources of different format, one sequence file and the other
> text. They share the same key, so I 'd like to have the following,
>
> map1: <k, v1> -> <k, v2>
> map2: <k, v1'> -> <k, v2'>
> Both v2 and v2' are of the same type, say, BytesWritable.
>
> I wonder if anyone could give me an example of MultipleInputs so that I can
> process these two data sources in the reducer.
>
> Thanks,
> Yin
>



-- 
Harsh J
www.harshj.com

Mime
View raw message