crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈竞 <cj.mag...@gmail.com>
Subject confused about the MapsideJoinStrategy, why use LoadLeftSideMapsideJoinStrategy, what if left table is too large to store in memory?
Date Tue, 10 May 2016 01:29:39 GMT
hi, i'm very confused when i use MapsideJoinStrategy. the origin
constructor was deprecated, instead, LoadLeftSideMapsideJoinStrategy was
recommended, the main improvement is that load left side table in memory,
whose size is large than right side. however, when i want to use mas side
join, the left side table usually is too large to store in memory.

for example i have to table A and B, we need A left join B, and
size(A)>>size(B), naturally we want to use map side join, and use A as left
side, B as right side, then load B in memory to process, it's very simple.
However, if we use LoadLeftSideMapsideJoinStrategy, we use A as right side,
B as left side, which makes no improvement while adding a reverse DoFn


-- 
陈竞,中科院计算技术研究所,高性能计算机中心
Jing Chen HPCC.ICT.AC China

Mime
View raw message