Return-Path: Delivered-To: apmail-pig-user-archive@www.apache.org Received: (qmail 32675 invoked from network); 8 Dec 2010 21:50:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Dec 2010 21:50:52 -0000 Received: (qmail 49769 invoked by uid 500); 8 Dec 2010 21:50:51 -0000 Delivered-To: apmail-pig-user-archive@pig.apache.org Received: (qmail 49721 invoked by uid 500); 8 Dec 2010 21:50:51 -0000 Mailing-List: contact user-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@pig.apache.org Delivered-To: mailing list user@pig.apache.org Received: (qmail 49713 invoked by uid 99); 8 Dec 2010 21:50:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 21:50:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael.moss@gmail.com designates 209.85.214.52 as permitted sender) Received: from [209.85.214.52] (HELO mail-bw0-f52.google.com) (209.85.214.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 21:50:45 +0000 Received: by bwz4 with SMTP id 4so1857562bwz.11 for ; Wed, 08 Dec 2010 13:50:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=m4fH3YnQw6TzD/ZVtT/q+XqX9e2tEJjAbeieNW7KPHQ=; b=T89HKHXgF+NewzxNj0orGUzWa68HN/cOB7ZjFcVdwnyAiDrAw1109Z9cvWny8lf5u7 tuOZVvZwKX2wd0TrrmQi1yuXfTJujRrQeBOeQZF4AiVvaglZ6u3RqgATUusLiTzwz0ql 9UJ+5q4e2g6YNromVsI3e1j9fSOsopj8P3vbs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=fhL/ntBVAOOeAc36CxEb3xr9Y4IGvWiyzNmJztoQ115+llDWaCTowZeKaAQ0gglRkA qgqGGacv9BMQeyCH5f8wWQ0PCImflLdZw465+SLvcdLYPUkz6yVYFUmN66qX24aqGHNZ +fvzVCLgOWzzP1RvVfKRvK06xEEMU87UdALF4= MIME-Version: 1.0 Received: by 10.204.126.33 with SMTP id a33mr2454023bks.119.1291845023645; Wed, 08 Dec 2010 13:50:23 -0800 (PST) Received: by 10.204.69.65 with HTTP; Wed, 8 Dec 2010 13:50:23 -0800 (PST) Date: Wed, 8 Dec 2010 16:50:23 -0500 Message-ID: Subject: Custom UDF + Grouping - Unexpected Output From: Michael Moss To: user@pig.apache.org Content-Type: multipart/alternative; boundary=0016e6d9707dbd401a0496ed1b7d --0016e6d9707dbd401a0496ed1b7d Content-Type: text/plain; charset=ISO-8859-1 Hello, I'm having an issue with a script that uses an EvalFunc I wrote. The issue is the final output contains characters that I am not expecting (commas - followed by what I'm guessing are null fields which I do not see). Snippet: C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int); grunt> DUMP C; (2,3) (2,4) (2,5) (3,4) (3,5) (4,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5) D = GROUP C by (f1,f2); grunt> describe D; D: {group: (f1: int,f2: int),C: {f1: int,f2: int}} grunt> DUMP D; ((2,3,),{(2,3,),(2,3,)}) ((2,4,),{(2,4,),(2,4,)}) ((2,5,),{(2,5,),(2,5,)}) ((3,4,),{(3,4,),(3,4,)}) ((3,5,),{(3,5,),(3,5,)}) ((4,5,),{(4,5,),(4,5,)}) My question is, what are these extra comma/null fiends in each tuple? I expected the first row to read as: ((2,3),{(2,3),(2,3)}) It seems related, but when I run 'ILLUSTRATE C', I get an exeption: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80) at org.apache.pig.pen.util.DisplayExamples.MakeArray(DisplayExamples.java:190) at org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:86) at org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:69) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:143) at org.apache.pig.PigServer.getExamples(PigServer.java:785) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:555) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Excruciating detail below: My script: REGISTER udf.jar A = LOAD '/pig_input/co.txt' as (line:chararray); B = FOREACH A GENERATE com.thumbplay.pig.NormalizeListUDF(line) as B; C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int); D = GROUP C by (f1,f2); E = FOREACH D GENERATE group, COUNT(C); STORE E INTO 'output' USING PigStorage(','); Here's what I'm trying to do: For input: A,1,2,3 B,1,2,3 Produce combinations for each row (My UDF does this): (1,2),(1,3),(2,3) (1,2),(1,3),(2,3) Flatten them: (1,2), (1,3), (2,3), (1,2), (1,3), (2,3) Group and count them: (1,2),2 (1,3),2 (2,3),2 --0016e6d9707dbd401a0496ed1b7d--