hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Help: frustration with types and whatnot while trying to do a conditional sum
Date Tue, 30 Nov 2010 16:14:14 GMT
Pig has moved to its own mailing lists. Please follow up over there.
-- Owen

On Tue, Nov 30, 2010 at 8:05 AM, Jonathan Coveney <jcoveney@gmail.com>wrote:

> I appreciate any help you can give. I've searched around and haven't found
> anything directly related... I've gone through documentation but can't find
> a real reason why this doesn't work.
>
> Here is the jist of my code (n1 is arbitrary, just to group by, n2 is
> either
> null or a large integer):
>
> table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
> pared = foreach table generate n1, n2;
> grouped = group pared by n1;
> counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ?
> 0:1))/(double)COUNT(pared.n2) as ratio:double;
> ordered = order counted by ratio desc;
> limited = limit ordered 200;
> dump limited;
>
> This gets this error:
>
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
> explicit cast.
>
> If I take out the double parenthesis in the counted sum
>
> ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
> chararray,pared: {n1: chararray,n2: chararray}}
>
> I THINK the error is that sum wants the column of a bag as an input, not
> actual integers...so I thought I'd try and make that happen by making the
> input take the form I want.
>
> So in order to try and get around this, I thought this might work (changing
> only these lines)
>
> pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int;
> grouped = group pared by n1;
> counted  = foreach grouped generate group,
> (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;
>
> But this gives this error:
> ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz:
> int}
>
> I have no real clue why this fails... I tried breaking it up into two steps
> and it doesn't matter.
>
> I'd ideally like to do this without making a UDF, as I feel the base
> functionality should support it. Not sure.
>
> Either way, I'd appreciate any help or pointers, as well as any rationale
> as
> to why it does or doesn't work within the pig framework. The whole bag
> system is still somewhat counterintuitive.
>
> Thank you for your time
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message