hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: JOIN-type operations with Hadoop...
Date Thu, 13 Sep 2007 18:10:22 GMT


See pig.

This one:  http://research.yahoo.com/project/pig

Not this one: http://en.wikipedia.org/wiki/Pig

On 9/13/07 10:45 AM, "Ashish Thusoo" <athusoo@facebook.com> wrote:

> On a related note - has anyone seen proposals or ideas for languages on
> top of hadoop map/reduce (could even be languages for some sort of code
> generators) to make writing the joins easy. It is quite a nightmare to
> write these joins especially when it involves multiple data sources. We
> are thinking of doing something similar. I wanted to find out if someone
> else has some ideas to share.
> 
> Thanks,
> Ashish
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Thursday, September 13, 2007 7:43 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: JOIN-type operations with Hadoop...
> 
> We use the directory namespace to distinguish different types of files.
> Wrote a simple wrapper around TextInputFormat/SequenceFileInputFormat -
> such that they key returned is the pathname (or some component of the
> pathname). That way u can look at the key - and then decide what kind of
> record structure the value encodes and take the proper action.
> 
> Ping me if u want an example and will be happy to share.
> 
> 
> -----Original Message-----
> From: C G [mailto:parallelguy@yahoo.com]
> Sent: Thursday, September 13, 2007 7:11 AM
> To: hadoop-user@lucene.apache.org
> Subject: JOIN-type operations with Hadoop...
> 
> Consider two row based files.  The first has fields:
>    
>       A B C
>    
>   the second has fields:
>    
>      B D E 
>    
>   I want to join these files on the key B, to create records of the
> form:
>    
>     A B C D E
>    
>   So B can be thought of as a primary key, and the second file will only
> distinct values of B...i.e. no repeats.
>    
>   I'm trying to reason through how to do this type of join operation in
> Hadoop but am unsure how to proceed with different "types" of files.
>    
>   Does the community have any wisdom to share?
>    
>   Thanks,
>   C G
> 
>        
> ---------------------------------
> Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
> links. 


Mime
View raw message