hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject RE: Question regarding Map side Join
Date Tue, 14 Jul 2009 04:47:09 GMT
Yes it is. However, I assume file 2 is "comparatively" small to be distributed across all computing
nodes without much delay, else the whole point of map side join is defeated. 
If keys in file 2 are unique, it is a simple lookup you need to implement. Else iterate over
them to implement the join.

-----Original Message-----
From: Pankil Doshi [mailto:forpankil@gmail.com] 
Sent: Tuesday, July 14, 2009 4:49 AM
To: core-user@hadoop.apache.org
Subject: Question regarding Map side Join

I have question regarding Mapside Join.
Finally I got a copy of your book.I tried Implementing it. and I have few
Questions on it.

File 1:
31    Rafferty
33    Jones
33    Steinberg
34    Robinson
34    Smith
<null>    Jasper

File 2:
31    sales
33    Engg
34    Clerical
35    Marketing

Results I got using mapside join

File1 inner join with File2
31    Rafferty
31    sales
33    Jones
33    Engg
33    Steinberg
33    Engg


File2 inner join with File1

31    sales
31    Rafferty
33    Engg
33    Jones
33    Engg
33    Steinberg
34    Clerical
34    Robinson
34    Clerical
34    Smith


But I am looking some result like below:

31    sales    Rafferty
33    Engg    Jones
33    Engg    Steinberg
34    Clerical    Robinson
34    Clerical    Smith


Is it possible using map-side join only??

I  am looking simple join such that key values present in both files .

Pankil

Mime
View raw message