hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viral Bajaria <viral.baja...@gmail.com>
Subject Re: Can anyone point me to a good Map Reduce in memory Join implementation?
Date Fri, 15 Feb 2013 21:47:03 GMT
Why not look at HIVE ? It already implements the JOIN that you are looking
for and has features to do MAPJOIN i.e. load small file into memory.

On Fri, Feb 15, 2013 at 1:25 PM, Yunming Zhang

> Hi,
> I am trying to do some work with in memory Join Map Reduce implementation,
> it can be summarized as a a join between two data set, R and S, one of
> them is too large to fit into memory, the other one can fit into memory
> reasonably well,
> (size of R << size of S). The typical implementation
> 1) distributes or broadcasts R to all map tasks (each mapper loads R in
> memory, hashed by join key).
> 2) map (stream) over S, divide S into datums and use it as input to each
> map task,
> 3) within each map task, for every tuple in S, look up join key in R
> 4) reduce computation is trivial
> If anyone could point me to a good implementation that I could use a
> reference, that would be great.
> I do plan to write my own implementation, but it would be helpful to take
> a look to see if there are established implementation out there,
> Thanks
> Yunming

View raw message