hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: looking for some help with pig syntax
Date Tue, 28 Aug 2007 19:47:02 GMT
Sorry, I misunderstood what you were trying to generate.  Perhaps the 
following will come closer:

t1 = load table1 as id, listOfId; -- <1, <2,3,4>>
t2 = load table2 as id, f1; -- <2,a>,<3,b>,<4,c>
a = foreach t1 generate id, flatten(listOfId); -- <1,2>,<1,3>,<1,4>
b = join a by $0, t2 by id; -- <2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>
c = group b by $1; -- <1,{<2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>}>
d = foreach d generate group, c.b::$4; -- <1, {<a>,<b>,<c>}>

where <> represents a tuple and {} a bag.

I'm not 100% sure of the syntax c.b::$4 for d, you may have to fiddle with that to get it
right.

Alan.




Joydeep Sen Sarma wrote:
> Will it?
>
> Trying an example:
>
> t1 = {<1, <2, 3, 4>>}
> t2 = {<2, "alpha">,<3,"beta">,<4,"gamma">}
>
> desired outcome c = {<1, <"alpha", "beta", "gamma">} /* or alternatively
> */
>                 c = {<1, <<2,"alpha">,<3,"beta">,<4,"gamma">>>}
>
> but as proposed (I hope I am reading the pig document correctly):
>
> t1a = {<2,3,4>}
> b = {<2, 2, "alpha">}
>
> // no point going further - this doesn't seem to be doing what I want ..
>
>
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com] 
> Sent: Tuesday, August 28, 2007 10:45 AM
> To: hadoop-user@lucene.apache.org
> Cc: utkarsh@yahoo-inc.com
> Subject: Re: looking for some help with pig syntax
>
> I think the following will do what you want.
>
> t1 = load table1 as id, listOfId;
> t2 = load table2 as id, f1;
> t1a = foreach t1 generate flatten(listOfId); -- flattens the lisOfId 
> into a set of ids
> b = join t1a by $0, t2 by id; -- join the two together.
> c = foreach b generate t2.id, t2.f1; -- project just the ids and f1
> entries.
>
> Alan.
>
> Joydeep Sen Sarma wrote:
>   
>> Specifically, how can we express this query:
>>
>>  
>>
>> Table1 contains: id, (list of ids)
>>
>> Table2 contains: id, f1
>>
>>  
>>
>> Where the Table1:list is a variable length list of foreign key (id)
>>     
> into
>   
>> Table2.
>>
>>  
>>
>> We would like to join every element of Table1:list with corresponding
>> Table2:id. Ie. The final output should of the form:
>>
>>  
>>
>> Table3 contains: id, (list of f1)
>>
>>  
>>
>> Couldn't quite figure out how to do this - does Pig Latin support
>>     
> nested
>   
>> foreach loops? If there's a more appropriate mailing list - please
>> re-direct,
>>
>>  
>>
>> Thanks,
>>
>>  
>>
>> Joydeep
>>
>>  
>>
>>  
>>
>>
>>   
>>     

Mime
View raw message