hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yunming Zhang <zhangyunming1...@gmail.com>
Subject Can anyone point me to a good Map Reduce in memory Join implementation?
Date Fri, 15 Feb 2013 21:25:06 GMT

I am trying to do some work with in memory Join Map Reduce implementation, 

it can be summarized as a a join between two data set, R and S, one of them is too large to
fit into memory, the other one can fit into memory reasonably well, 
(size of R << size of S). The typical implementation 
1) distributes or broadcasts R to all map tasks (each mapper loads R in memory, hashed by
join key). 
2) map (stream) over S, divide S into datums and use it as input to each map task,
3) within each map task, for every tuple in S, look up join key in R
4) reduce computation is trivial

If anyone could point me to a good implementation that I could use a reference, that would
be great.
I do plan to write my own implementation, but it would be helpful to take a look to see if
there are established implementation out there, 

View raw message