hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Jadhav <vikascjadha...@gmail.com>
Subject Re: Modifying Hadoop For join Operation
Date Thu, 24 Jan 2013 19:11:38 GMT
HI Thanks @ Harsh for replying

I am attaching paper called Map-Join-Reduce

I want to implement similar kind of architecture.

Currently MapReduce Proccess join job using Map or reduce Side join

For Reduce Side join job it has drawback

 -->for large datasets there is lot of traffic(data movenment) from
      mapper to reduces(one option We can filter out record using
      Bloloom   Filter like technique)

 FOR THIS I WANT TO PROCESS ALL JOIN IN SINGLE  MAPREDUCE JOB
1) MAP PHASE- processes all datasets and filter out record
2) REDUCE PHASE -
   reduce phase divided in to join and reducer

   join - joins all datasets
   reducer - does aggregation

   for R join S join T
                                    Reduce
mapR
mapS   -----> mapR join mapS => RS   =>RST  --> Reducer(aggrgation)
mapT-------------------------------------->mapT

If you have any idea plze share it.

any other suggestion also we welcome if it reduces completion time for
joining large dataset
thank you

**


On Thu, Jan 24, 2013 at 8:39 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi,
>
> Can you also define 'efficient way' and the idea you have in mind to
> implement that isn't already doable today?
>
> On Thu, Jan 24, 2013 at 6:51 PM, Vikas Jadhav <vikascjadhav87@gmail.com>
> wrote:
> > Anyone has idea about how should i modify Hadoop Code for
> > Performing Join operation in efficient Way.
> > Thanks.
> >
> > --
> >
> >
> > Thanx and Regards
> >  Vikas Jadhav
>
>
>
> --
> Harsh J
>



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Mime
View raw message