hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject Re: Multiple Mappers and One Reducer
Date Wed, 07 Sep 2011 10:04:35 GMT
Harsh, Can you please tell how can we use MultipleInputs using Job Object on
hadoop 0.20.2. As you can see, in MultipleInputs, its using JobConf object.
I want to use Job object as mentioned in new hadoop 0.21 API.
I remember you talked about pulling out things from new API and add it into
out project.
Can you please add more light how can we do this ?

Thanks ,
Praveenesh.

On Wed, Sep 7, 2011 at 2:57 AM, Harsh J <harsh@cloudera.com> wrote:

> Sahana,
>
> Yes this is possible as well. Please take a look at the MultipleInputs
> API @
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html
>
> It will allow you to add a path each with its own mapper
> implementation, and you can then have a common reducer since the key
> is what you'll be matching against.
>
> On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat <sana.bhat@gmail.com> wrote:
> > Hi,
> >         I understand that given a file, the file is split across 'n'
> mapper
> > instances, which is the normal case.
> > The scenario i have is :
> > 1. Two files which are not totally identical in terms of number of
> columns
> > (but have data that is similar in a few columns) need to be processed and
> > after computation a single output file has to be generated.
> > Note : CV - computedvalue
> > File1 belonging to one dataset has data for :
> > Date,counter1,counter2, CV1,CV2
> > File2 belonging to another dataset has data for :
> > Date,counter1,counter2,CV3,CV4,CV5
> > Computation to be carried out on these two files is :
> > CV6 =(CV1*CV5)/100
> > And the final emitted output file should have data in the sequence:
> > Date,counter1,counter2,CV6
> > The idea is to have two mappers (not instances) run on each of the file,
> and
> > a single reducer that emits the final result file.
> > Thanks,
> > Sahana
> > On Wed, Sep 7, 2011 at 2:40 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Sahana,
> >>
> >> Yes. But, isn't that how it is normally? What makes you question this
> >> capability?
> >>
> >> On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat <sana.bhat@gmail.com>
> wrote:
> >> > Hi,
> >> >          Is it possible to have multiple mappers  where each mapper is
> >> > operating on a different input file and whose result (which is a key
> >> > value
> >> > pair from different mappers) is processed by a single reducer?
> >> > Regards,
> >> > Sahana
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message