hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sahana Bhat <sana.b...@gmail.com>
Subject Re: Multiple Mappers and One Reducer
Date Wed, 07 Sep 2011 09:32:56 GMT
Hi,

        I understand that given a file, the file is split across 'n' mapper
instances, which is the normal case.

The scenario i have is :
1. Two files which are not totally identical in terms of number of columns
(but have data that is similar in a few columns) need to be processed and
after computation a single output file has to be generated.

Note : CV - computedvalue

File1 belonging to one dataset has data for :
Date,counter1,counter2, CV1,CV2

File2 belonging to another dataset has data for :
Date,counter1,counter2,CV3,CV4,CV5

Computation to be carried out on these two files is :
CV6 =(CV1*CV5)/100

And the final emitted output file should have data in the sequence:
Date,counter1,counter2,CV6

The idea is to have two mappers (not instances) run on each of the file, and
a single reducer that emits the final result file.

Thanks,
Sahana

On Wed, Sep 7, 2011 at 2:40 PM, Harsh J <harsh@cloudera.com> wrote:

> Sahana,
>
> Yes. But, isn't that how it is normally? What makes you question this
> capability?
>
> On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat <sana.bhat@gmail.com> wrote:
> > Hi,
> >          Is it possible to have multiple mappers  where each mapper is
> > operating on a different input file and whose result (which is a key
> value
> > pair from different mappers) is processed by a single reducer?
> > Regards,
> > Sahana
>
>
>
> --
> Harsh J
>

Mime
View raw message