hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: JOIN-type operations with Hadoop...
Date Mon, 17 Sep 2007 19:31:06 GMT


I tried to use it, but it is currently tied to the 0.13.1 release of hadoop
as specially patched with the HadoopExe class by the pig group.

I hear that they are working on a new release and also working towards
having a real open source version with nightly builds RSN, but there hasn't
been any externally visible progress lately.

 


On 9/17/07 11:14 AM, "Ashish Thusoo" <athusoo@facebook.com> wrote:

> Thanks for the pointer.
> 
> We did take a look at pig and did find that it some of the constructs
> that we have been talking about. How stable is the pig software? Has
> anyone on this list used it?
> 
> Thanks,
> Ashish
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@veoh.com]
> Sent: Thursday, September 13, 2007 11:10 AM
> To: hadoop-user@lucene.apache.org
> Subject: Re: JOIN-type operations with Hadoop...
> 
> 
> 
> See pig.
> 
> This one:  http://research.yahoo.com/project/pig
> 
> Not this one: http://en.wikipedia.org/wiki/Pig
> 
> On 9/13/07 10:45 AM, "Ashish Thusoo" <athusoo@facebook.com> wrote:
> 
>> On a related note - has anyone seen proposals or ideas for languages
> on
>> top of hadoop map/reduce (could even be languages for some sort of
> code
>> generators) to make writing the joins easy. It is quite a nightmare to
>> write these joins especially when it involves multiple data sources.
> We
>> are thinking of doing something similar. I wanted to find out if
> someone
>> else has some ideas to share.
>> 
>> Thanks,
>> Ashish
>> 
>> -----Original Message-----
>> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>> Sent: Thursday, September 13, 2007 7:43 AM
>> To: hadoop-user@lucene.apache.org
>> Subject: RE: JOIN-type operations with Hadoop...
>> 
>> We use the directory namespace to distinguish different types of
> files.
>> Wrote a simple wrapper around TextInputFormat/SequenceFileInputFormat
> -
>> such that they key returned is the pathname (or some component of the
>> pathname). That way u can look at the key - and then decide what kind
> of
>> record structure the value encodes and take the proper action.
>> 
>> Ping me if u want an example and will be happy to share.
>> 
>> 
>> -----Original Message-----
>> From: C G [mailto:parallelguy@yahoo.com]
>> Sent: Thursday, September 13, 2007 7:11 AM
>> To: hadoop-user@lucene.apache.org
>> Subject: JOIN-type operations with Hadoop...
>> 
>> Consider two row based files.  The first has fields:
>>    
>>       A B C
>>    
>>   the second has fields:
>>    
>>      B D E 
>>    
>>   I want to join these files on the key B, to create records of the
>> form:
>>    
>>     A B C D E
>>    
>>   So B can be thought of as a primary key, and the second file will
> only
>> distinct values of B...i.e. no repeats.
>>    
>>   I'm trying to reason through how to do this type of join operation
> in
>> Hadoop but am unsure how to proceed with different "types" of files.
>>    
>>   Does the community have any wisdom to share?
>>    
>>   Thanks,
>>   C G
>> 
>>        
>> ---------------------------------
>> Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
>> links. 
> 


Mime
View raw message