hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: Join Documentation Correct?
Date Thu, 19 Nov 2009 12:37:16 GMT
Are you certain that your records are being split into key and value the way
you expect. That is the usual reason for odd join behavior.
I haven't used the join code past 19.1, however.

On Wed, Nov 18, 2009 at 12:42 PM, Edmund Kohlwey <ekohlwey@gmail.com> wrote:

> I'm using Cloudera's distribution for Hadoop 0.20.1 + 133
>
> The javadocs for package org.apache.hadoop.mapred.join state " For a given
> key, each operation will consider the cross product of all values for all
> sources at that node"
>
> I'm doing an inner join between two tables with a text key. One table has
> multiple values for the same key. I would expect, from the documentation, to
> see the cross product of the values for a given key represented in the
> output. Instead I'm simply getting a single row. Does anyone know if this is
> a bug or if its the intended functionality (and the documentation is
> flawed)?
>
> table 1
> k1 -> a
>
> table 2
> k1 ->c
> k1 ->d
>
> I should get:
> table 1 inner join table 2
> k1->ac
> k1->ad
>
> Instead I'm getting:
> table 1 inner join table 2
> k1->ac
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message