hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eguzki Astiz Lezaun <egu...@tid.es>
Subject map reduce to achieve cartessian product
Date Wed, 16 Dec 2009 13:35:25 GMT
Hi,

First, I would like to apologise if this question has been asked before 
(I am quite sure it has been) and I would appreciate very much if 
someone replies with a link to the answer.

My question is quite simple.

I have to files or datasets having a list of integers.

example:
dataset A: (a,b,c)
dataset B: (d,e,f)

I would like to design a map-reduce job to have at the ouput:

(a,d)
(a,e)
(a,f)
(b,d)
(b,e)
(b,f)
(c,d)
(c,e)
(c,f)

I guess this is a typical cartessian product of two datasets.

I found ways to do joins using map-reduce, but a common key is required 
on both dataset. This is not the case.

Any clue how to do this?

Thanks in advance.
-- 
Eguzki Astiz Lezaun
Technology and Architecture Strategy
C\ VIA AUGUSTA, 177 	Tel: +34 93 36 53179
08021 BARCELONA 	www.tid.es

Telef├│nica Investigaci├│n y Desarrollo
EKO 	Do you need to print it? We protect the environment.


Mime
View raw message