pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dai <jiany...@yahoo-inc.com>
Subject Re: Custom UDF + Grouping - Unexpected Output
Date Thu, 09 Dec 2010 00:04:12 GMT
It is not expected. I would think something wrong inside 
NormalizeListUDF. Make sure you feed bag of tuples which has the schema 
(int, int) inside your UDF. If you can post your UDF, I can know better.

Daniel

Michael Moss wrote:
> Hello,
>
> I'm having an issue with a script that uses an EvalFunc I wrote. The issue
> is the final output contains characters that I am not expecting (commas -
> followed by what I'm guessing are null fields which I do not see).
>
> Snippet:
> C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int);
> grunt> DUMP C;
> (2,3)
> (2,4)
> (2,5)
> (3,4)
> (3,5)
> (4,5)
> (2,3)
> (2,4)
> (2,5)
> (3,4)
> (3,5)
> (4,5)
>
> D = GROUP C by (f1,f2);
> grunt> describe D;
> D: {group: (f1: int,f2: int),C: {f1: int,f2: int}}
>
> grunt> DUMP D;
> ((2,3,),{(2,3,),(2,3,)})
> ((2,4,),{(2,4,),(2,4,)})
> ((2,5,),{(2,5,),(2,5,)})
> ((3,4,),{(3,4,),(3,4,)})
> ((3,5,),{(3,5,),(3,5,)})
> ((4,5,),{(4,5,),(4,5,)})
>
> My question is, what are these extra comma/null fiends in each tuple? I
> expected the first row to read as:
> ((2,3),{(2,3),(2,3)})
>
> It seems related, but when I run 'ILLUSTRATE C', I get an exeption:
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
> at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80)
> at
> org.apache.pig.pen.util.DisplayExamples.MakeArray(DisplayExamples.java:190)
> at
> org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:86)
> at
> org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:69)
> at
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:143)
> at org.apache.pig.PigServer.getExamples(PigServer.java:785)
> at
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:555)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> at org.apache.pig.Main.main(Main.java:357)
>
> Excruciating detail below:
>
> My script:
> REGISTER udf.jar
> A = LOAD '/pig_input/co.txt' as (line:chararray);
> B = FOREACH A GENERATE com.thumbplay.pig.NormalizeListUDF(line) as B;
> C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int);
> D = GROUP C by (f1,f2);
> E = FOREACH D GENERATE group, COUNT(C);
> STORE E INTO 'output' USING PigStorage(',');
>
> Here's what I'm trying to do:
> For input:
> A,1,2,3
> B,1,2,3
>
> Produce combinations for each row (My UDF does this):
> (1,2),(1,3),(2,3)
> (1,2),(1,3),(2,3)
>
> Flatten them:
> (1,2),
> (1,3),
> (2,3),
> (1,2),
> (1,3),
> (2,3)
>
> Group and count them:
> (1,2),2
> (1,3),2
> (2,3),2
>   


Mime
View raw message