hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-577) outer join query looses name information
Date Wed, 24 Dec 2008 20:00:51 GMT

    [ https://issues.apache.org/jira/browse/PIG-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659152#action_12659152

Santhosh Srinivasan commented on PIG-577:

The use of the null constant in the bincond in the context of a flatten should handle the
following cases:

Assumption: one of the columns in the bincond is a null constant.

1. If the other column is a simple type or a map then cast the null to the other type.
2. If the other column is a complex type other than a map then remove the null constant and
supplant it with 
a bag or a tuple or a map constant with the appropriate elements, i.e., if the other column
is a bag with a tuple that contains three columns (say int, float, chararray) then replace
null constant with a bag that contains a tuple with three null constants. The same reasoning
applies to a tuple column.

Upon flattening the complex types will give out the appropriate number of columns.

Handling null constants for complex type has implications when the constant is materialized
via dump or store. If the null constant is replaced with an appropriate bag/tuple/map then
the materialized
constant will look like {(,,)} or (,,) or []. This conflicts with our existing view of nulls
being empty when

> outer join query looses name information
> ----------------------------------------
>                 Key: PIG-577
>                 URL: https://issues.apache.org/jira/browse/PIG-577
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
> The following query:
> A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
> B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, contributions:
> C = COGROUP A BY name, B BY name;
> D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), flatten((IsEmpty(B) ?
null : B));
> describe D;
> E = FOREACH D GENERATE A::gpa, B::contributions;
> Give the following error: (Even though describe shows correct information: D: {group:
chararray,A::name: chararray,A::age: int,A::gpa: float,B::name: chararray,B::age: int,B::registration:
chararray,B::contributions: float}
> java.io.IOException: Invalid alias: A::gpa in {group: chararray,bytearray,bytearray}
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
>         at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: A::gpa
in {group: chararray,bytearray,bytearray}
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5930)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5788)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:3974)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3871)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3825)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3734)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3660)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3626)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3552)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3462)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3419)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2894)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2309)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:966)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:742)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:537)
>         at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
>         ... 6 more

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message