hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmund Kohlwey <ekohl...@gmail.com>
Subject Join Documentation Correct?
Date Wed, 18 Nov 2009 20:42:09 GMT
I'm using Cloudera's distribution for Hadoop 0.20.1 + 133

The javadocs for package org.apache.hadoop.mapred.join state " For a 
given key, each operation will consider the cross product of all values 
for all sources at that node"

I'm doing an inner join between two tables with a text key. One table 
has multiple values for the same key. I would expect, from the 
documentation, to see the cross product of the values for a given key 
represented in the output. Instead I'm simply getting a single row. Does 
anyone know if this is a bug or if its the intended functionality (and 
the documentation is flawed)?

table 1
k1 -> a

table 2
k1 ->c
k1 ->d

I should get:
table 1 inner join table 2
k1->ac
k1->ad

Instead I'm getting:
table 1 inner join table 2
k1->ac

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message