hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Parallel map side join
Date Fri, 10 Jun 2011 22:56:35 GMT

How to configure map side join in multiple mappers in parallel?

Suppose I have data set s   a1,  a2, a3 and data set  b1, b2, b3    .

I want to let a1 join with b1,   a2 join with b2,   a3 join with b3 and 
let the join done in parallel? I think it should be able to configure in 
mapper 1 joining a1 with b1,   in mapper 2 joining a2 with b2, .... How 
should I configure this in hadoop anyway? Does  CompositeInputFormat  
take multiple series of input? Thanks.


View raw message