hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chri...@yahoo-inc.com>
Subject Re: "Join" example
Date Fri, 08 Aug 2008 20:56:55 GMT
The contrib/data_join framework is different from the map-side join  
framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few  
sample, text input files, tab-separated:


AAAAAAAA        a0
BBBBBBBB        a1
CCCCCCCC        a2
CCCCCCCC        a3


AAAAAAAA        b0
BBBBBBBB        b1
BBBBBBBB        b2
BBBBBBBB        b3


AAAAAAAA        c0
BBBBBBBB        c1
DDDDDDDD        c2
DDDDDDDD        c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-00000:

host$ bin/hadoop fs -text joinout/part-00000 | less
AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]


On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

> There are some examples in $HADOOPHOME/src/contrib/data_join, which  
> I hope
> would help.
> Wei
> -----Original Message-----
> From: John DeTreville [mailto:jdd@yahoo-inc.com]
> Sent: Friday, August 08, 2008 2:34 AM
> To: core-user@hadoop.apache.org
> Subject: "Join" example
> Hadoop ships with a few example programs. One of these is "join,"  
> which
> I believe demonstrates map-side joins. I'm finding its usage
> instructions a little impenetrable; could anyone send me instructions
> that are more like "type this" then "type this" then "type this"?
> Thanks in advance.
> Cheers,
> John

View raw message