pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2511) Enable '*' to skip any fields that have already been generated and cast in other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE manipulate(foo1) as foo1, *;
Date Thu, 22 Mar 2012 18:00:23 GMT

    [ https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235810#comment-13235810
] 

Scott Carey commented on PIG-2511:
----------------------------------

This one is annoying me again today.

I have a relation inbound with ~20 fields.  One of them is a bag of about 100 tuples.  All
I want to do is flatten it and project out two tuples.

B = FOREACH A GENERATE *, FLATTEN(x.(foo, bar)) as flatx;

Ok, now I have a problem:

The bag of 100 is still in the relation, copied 100 times.  To get rid of it I need to list
every field one by one instead of use *.  No, PIG-1693 is not useful.  The field order is
subject to change.   This chunk needs to be resilient to changes in the inbound aliases that
do not change the semantic meaning of fields.

Then the next step is to project out the foo and bar from flatx, which will require listing
the 20 fields AGAIN.

This issue is generally worse when you are using FLATTEN than simple projection, since it
is much more important to drop the fields for performance reasons.  Some sane syntax here
could easily cut the size of most of my scripts by more than half!
                
> Enable '*' to skip any fields that have already been generated and cast in other parts
of the GENERATE, as in: foo = FOREACH my_relation GENERATE manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2511
>                 URL: https://issues.apache.org/jira/browse/PIG-2511
>             Project: Pig
>          Issue Type: New Feature
>          Components: grunt, parser
>    Affects Versions: 0.9.1
>            Reporter: Russell Jurney
>              Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE, dd MMM
yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: 
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.PlanValidationException:
ERROR 1108: 
> <line 8, column 30> Duplicate schema alias: date
> 	at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
> 	at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
> 	at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
> 	at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> 	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> 	at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
> 	at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
> 	at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> 	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> 	at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
> 	at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> 	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> 	at org.apache.pig.Main.run(Main.java:495)
> 	at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message