hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boyd <db...@lorenzresearch.com>
Subject Re: Can anyone point me to a good Map Reduce in memory Join implementation?
Date Sat, 16 Feb 2013 02:22:40 GMT
Use PIG it has specific directives for in memory joins of small
data sets.  The whole thing might require a half a dozen lines
of code.

On 2/15/2013 4:25 PM, Yunming Zhang wrote:
> Hi,
> I am trying to do some work with in memory Join Map Reduce implementation,
> it can be summarized as a a join between two data set, R and S, one of
> them is too large to fit into memory, the other one can fit into memory
> reasonably well,
> (size of R << size of S). The typical implementation
> 1) distributes or broadcasts R to all map tasks (each mapper loads R in
> memory, hashed by join key).
> 2) map (stream) over S, divide S into datums and use it as input to each
> map task,
> 3) within each map task, for every tuple in S, look up join key in R
> 4) reduce computation is trivial
> If anyone could point me to a good implementation that I could use a
> reference, that would be great.
> I do plan to write my own implementation, but it would be helpful to
> take a look to see if there are established implementation out there,
> Thanks
> Yunming

========= mailto:dboyd@lorenzresearch.com ============
David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ============

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

View raw message