hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi" <runp...@yahoo-inc.com>
Subject RE: JOIN-type operations with Hadoop...
Date Thu, 13 Sep 2007 14:19:00 GMT

Check out the data_join package under hadoop/contrib..
It offers a generic framework for doing various joining operations.


Runping


> -----Original Message-----
> From: C G [mailto:parallelguy@yahoo.com]
> Sent: Thursday, September 13, 2007 7:11 AM
> To: hadoop-user@lucene.apache.org
> Subject: JOIN-type operations with Hadoop...
> 
> Consider two row based files.  The first has fields:
> 
>       A B C
> 
>   the second has fields:
> 
>      B D E
> 
>   I want to join these files on the key B, to create records of the form:
> 
>     A B C D E
> 
>   So B can be thought of as a primary key, and the second file will only
> distinct values of B...i.e. no repeats.
> 
>   I'm trying to reason through how to do this type of join operation in
> Hadoop but am unsure how to proceed with different "types" of files.
> 
>   Does the community have any wisdom to share?
> 
>   Thanks,
>   C G
> 
> 
> ---------------------------------
> Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
> links.


Mime
View raw message