hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Help: frustration with types and whatnot while trying to do a conditional sum
Date Tue, 30 Nov 2010 16:05:46 GMT
I appreciate any help you can give. I've searched around and haven't found
anything directly related... I've gone through documentation but can't find
a real reason why this doesn't work.

Here is the jist of my code (n1 is arbitrary, just to group by, n2 is either
null or a large integer):

table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
pared = foreach table generate n1, n2;
grouped = group pared by n1;
counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ?
0:1))/(double)COUNT(pared.n2) as ratio:double;
ordered = order counted by ratio desc;
limited = limit ordered 200;
dump limited;

This gets this error:

ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
explicit cast.

If I take out the double parenthesis in the counted sum

ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
chararray,pared: {n1: chararray,n2: chararray}}

I THINK the error is that sum wants the column of a bag as an input, not
actual integers...so I thought I'd try and make that happen by making the
input take the form I want.

So in order to try and get around this, I thought this might work (changing
only these lines)

pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int;
grouped = group pared by n1;
counted  = foreach grouped generate group,
(double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;

But this gives this error:
ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz:
int}

I have no real clue why this fails... I tried breaking it up into two steps
and it doesn't matter.

I'd ideally like to do this without making a UDF, as I feel the base
functionality should support it. Not sure.

Either way, I'd appreciate any help or pointers, as well as any rationale as
to why it does or doesn't work within the pig framework. The whole bag
system is still somewhat counterintuitive.

Thank you for your time

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message