hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hc busy <hc.b...@gmail.com>
Subject Re: more bagging fun
Date Wed, 10 Mar 2010 17:31:24 GMT
So, pig team, what is the right way to accomplish this?

On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>
>> okay. Here's the bag that I have:
>>
>>  {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>> number2:int}}
>>
>>
>>
>> and I want to do this
>>
>> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
>> TABLE.number2);
>>
>
>
> TABLE.number1 actually gives you the bag of number1 values found in TABLE -
> but I am never really sure of the semantics in these situations since I am
> slightly nervous that it is impl dependent ... my understanding is, what you
> are attempting should not work, but I could be wrong.
>
> I do know that TABLE.(number1, number2) will consistently project and pair
> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
> does something like this :
>
> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group,
> DIVIDE_SUM(TABLE.(number1 , number2));
>
> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
> (numerator, denominator) : and returns :
>
> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>
>
> Obviously, this is not as 'elegant' as your initial code and is definitely
> more cumbersome ... so clarifying this behavior with someone from pig team
> will definitely be better before you attempt this.
>
>
> Regards,
> Mridul
>
>
>
>> grunt>  DUMP CALCULATE;
>>
>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>> side:bag
>> right hand side:bag
>>
>>
>>
>> This seems useful that I may want to calculate an agg. of some arithmetic
>> operations on member of a bag. Any suggestions?
>>
>> ... Looking at the documentation it looks like I want to do something like
>>
>> SUM(TABLE.(number1 / number2))
>>
>> but that doesn't work either :-(
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message