Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of zhangyunming1990@gmail.com
 designates 209.85.214.178 as permitted sender)
From: Yunming Zhang <zhangyunming1990@gmail.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_FE4767F5-C9B0-4EB0-BC93-736316B95FA0"
Subject: Can anyone point me to a good Map Reduce in memory Join
 implementation?
Message-Id: <237BDF84-8907-4BE7-9EB6-F88415BAA36D@gmail.com>
Date: Fri, 15 Feb 2013 15:25:06 -0600
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))


--Apple-Mail=_FE4767F5-C9B0-4EB0-BC93-736316B95FA0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi,=20

I am trying to do some work with in memory Join Map Reduce =
implementation,=20

it can be summarized as a a join between two data set, R and S, one of =
them is too large to fit into memory, the other one can fit into memory =
reasonably well,=20
(size of R << size of S). The typical implementation=20
1) distributes or broadcasts R to all map tasks (each mapper loads R in =
memory, hashed by join key).=20
2) map (stream) over S, divide S into datums and use it as input to each =
map task,
3) within each map task, for every tuple in S, look up join key in R
4) reduce computation is trivial

If anyone could point me to a good implementation that I could use a =
reference, that would be great.
I do plan to write my own implementation, but it would be helpful to =
take a look to see if there are established implementation out there,=20

Thanks
Yunming=

--Apple-Mail=_FE4767F5-C9B0-4EB0-BC93-736316B95FA0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Hi,&nbsp;<div><br></div><div>I am trying to do some work with in =
memory Join Map Reduce =
implementation,&nbsp;</div><div><br></div><div><div style=3D"color: =
rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; =
background-color: rgb(255, 255, 255); ">it can be summarized as a a join =
between two data set, R and S, one of them is too large to fit into =
memory, the other one can fit into memory reasonably =
well,&nbsp;</div><div style=3D"color: rgb(34, 34, 34); font-family: =
arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, =
255); ">(size of R &lt;&lt; size of S). The typical =
implementation&nbsp;</div><div style=3D"color: rgb(34, 34, 34); =
font-family: arial, sans-serif; font-size: 13px; background-color: =
rgb(255, 255, 255); ">1) distributes or broadcasts R to all map tasks =
(each mapper loads R in memory, hashed by join key).&nbsp;</div><div =
style=3D"color: rgb(34, 34, 34); font-family: arial, sans-serif; =
font-size: 13px; background-color: rgb(255, 255, 255); ">2) map (stream) =
over S, divide S into datums and use it as input to each map =
task,</div><div style=3D"color: rgb(34, 34, 34); font-family: arial, =
sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">3) =
within each map task, for every tuple in S, look up join key in =
R</div><div style=3D"color: rgb(34, 34, 34); font-family: arial, =
sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">4) =
reduce computation is trivial</div></div><div style=3D"color: rgb(34, =
34, 34); font-family: arial, sans-serif; font-size: 13px; =
background-color: rgb(255, 255, 255); "><br></div><div style=3D"color: =
rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; =
background-color: rgb(255, 255, 255); ">If anyone could point me to a =
good implementation that I could use a reference, that would be =
great.</div><div style=3D"color: rgb(34, 34, 34); font-family: arial, =
sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">I =
do plan to write my own implementation, but it would be helpful to take =
a look to see if there are established implementation out =
there,&nbsp;</div><div style=3D"color: rgb(34, 34, 34); font-family: =
arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, =
255); "><br></div><div style=3D"color: rgb(34, 34, 34); font-family: =
arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, =
255); ">Thanks</div><div style=3D"color: rgb(34, 34, 34); font-family: =
arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, =
255); ">Yunming</div></body></html>=

--Apple-Mail=_FE4767F5-C9B0-4EB0-BC93-736316B95FA0--