hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chri...@yahoo-inc.com>
Subject Re: "Join" example
Date Fri, 08 Aug 2008 20:56:55 GMT
The contrib/data_join framework is different from the map-side join  
framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few  
sample, text input files, tab-separated:

join/a.txt:

AAAAAAAA        a0
BBBBBBBB        a1
CCCCCCCC        a2
CCCCCCCC        a3

join/b.txt:

AAAAAAAA        b0
BBBBBBBB        b1
BBBBBBBB        b2
BBBBBBBB        b3

join/c.txt:

AAAAAAAA        c0
BBBBBBBB        c1
DDDDDDDD        c2
DDDDDDDD        c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-00000:

host$ bin/hadoop fs -text joinout/part-00000 | less
AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

-C

On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

> There are some examples in $HADOOPHOME/src/contrib/data_join, which  
> I hope
> would help.
>
> Wei
>
> -----Original Message-----
> From: John DeTreville [mailto:jdd@yahoo-inc.com]
> Sent: Friday, August 08, 2008 2:34 AM
> To: core-user@hadoop.apache.org
> Subject: "Join" example
>
> Hadoop ships with a few example programs. One of these is "join,"  
> which
> I believe demonstrates map-side joins. I'm finding its usage
> instructions a little impenetrable; could anyone send me instructions
> that are more like "type this" then "type this" then "type this"?
>
> Thanks in advance.
>
> Cheers,
> John
>


Mime
View raw message