hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jingguo yao <yaojing...@gmail.com>
Subject What is the output format of org.apache.hadoop.examples.Join?
Date Fri, 05 Apr 2013 00:46:22 GMT
I am reading the following mail:


After running the following command (I am using Hadoop 1.0.4):

bin/hadoop jar hadoop-examples-1.0.4.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout

Then I run "bin/hadoop fs -text joinout/part-00000". I see the following

AAAAAAAA        a0      [,,]
AAAAAAAA        b0      [,,]
AAAAAAAA        c0      [,,]
BBBBBBBB        a1      [,,]
BBBBBBBB        b1      [,,]
BBBBBBBB        b2      [,,]
BBBBBBBB        b3      [,,]
BBBBBBBB        c1      [,,]
CCCCCCCC        a2      [,,]
CCCCCCCC        a3      [,,]
DDDDDDDD        c2      [,,]
DDDDDDDD        c3      [,,]

But Chris said that the result should be:

AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

Is Join's output format changed for Hadoop 1.0.4?


View raw message