hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chri...@yahoo-inc.com>
Subject Re: MapSide Join and left outer or right outer joins?
Date Thu, 03 Jul 2008 01:07:06 GMT
Hi Jason-

> It only seems like full outer or full inner joins are supported. I  
> was hoping to just do a left outer join.
>
> Is this supported or planned?


The full inner/outer joins are examples, really. You can define your  
own operations by extending o.a.h.mapred.join.JoinRecordReader or  
o.a.h.mapred.join.MultiFilterRecordReader and registering your new  
identifier with the parser by defining a property  
"mapred.join.define.<ident>" as your class.

For a left outer join, JoinRecordReader is the correct base.  
InnerJoinRecordReader and OuterJoinRecordReader should make its use  
clear.

> On the flip side doing the Outer Join is about 8x faster than doing  
> a map/reduce over our dataset.

Cool! Out of curiosity, how are you managing your splits? -C

Mime
View raw message