hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-449) Schemas for bags should contain tuples all the time
Date Mon, 08 Dec 2008 23:03:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Pradeep Kamath updated PIG-449:

    Assignee: Pradeep Kamath  (was: Santhosh Srinivasan)
      Status: Patch Available  (was: Open)

A new flag has been introduced in schema to distinguish bag schemas which have only one field
schema of a tuple containing a list of field schemas for the elements in the bag (these kind
of bag schemas occur in two cases explained in the code comment below). This flag is will
be used to solve the problems reported in this issue by resolving access to fields in such
bags as access to the fields present in the inner tuple schema. This is explained in the comment
for this flag pasted here for reference:
    // In bags which have a schema with a tuple which contains
    // the fields present in it, if we access the second field (say)
    // we are actually trying to access the second field in the
    // tuple in the bag. This is currently true for two cases:
    // 1) bag constants - the schema of bag constant has a tuple
    // which internally has the actual elements
    // 2) When bags are loaded from input data, if the user 
    // specifies a schema with the "bag" type, he has to specify
    // the bag as containing a tuple with the actual elements in 
    // the schema declaration. However in both the cases above,
    // the user can still say b.i where b is the bag and i is 
    // an element in the bag's tuple schema. So in these cases,
    // the access should translate to a lookup for "i" in the 
    // tuple schema present in the bag. To indicate this, the
    // flag below is used. It is false by default because, 
    // currently we use bag as the type for relations. However 
    // the schema of a relation does NOT have a tuple fieldschema
    // with items in it. Instead, the schema directly has the 
    // field schema of the items. So for a relation "b", the 
    // above b.i access would be a direct single level access
    // of i in b's schema. This is treated as the "default" case
    private boolean twoLevelAccessRequired = false;

The changes are in getPosition() in Schema.java to use the above flag to do a two level access
whenever an access to the above kind of bag is involved. Besides this there are changes in
getSchema() of LOForEach and getFieldSchema() of LOProject to use the inner tuple schema in
cases of these kinds of bags. A new unit test case, TestDataBagAccess has also been added
to test out various access scenarios for the above cases of bag schemas which have a tuple
field schema with a list of item field schemas.

> Schemas for bags should contain tuples all the time
> ---------------------------------------------------
>                 Key: PIG-449
>                 URL: https://issues.apache.org/jira/browse/PIG-449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-449.patch
> The front end treats relations as operators that return bags.  When the schema of a load
statement is specified, the bag is associated with the schema specified by the user. Ideally,
the schema corresponds to the tuple contained in the bag. 
> With PIG-380, the schema for bag constants are computed by the front end. The schema
for the bag contains the tuple which in turn contains the schema of the columns. This results
in errors when columns are accessed directly just like the load statements.
> The front end should then treat access to the columns as a double dereference, i.e.,
access the tuple inside the bag and then the column inside the tuple.
> {code}
> grunt> a = load '/user/sms/data/student.data' using PigStorage(' ') as (name, age,
> grunt> b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: double,
c: chararray)};
> grunt> describe b;
> b: {b: {t: (i: integer,d: double,c: chararray)}}
> grunt> c = foreach b generate b.i;
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Invalid
alias: i in {t: (i: integer,d: double,c: chararray)}
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:293)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:258)
>         at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242)
>         at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:93)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
>         at org.apache.pig.Main.main(Main.java:282)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: i in
{t: (i: integer,d: double,c: chararray)}
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5851)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5709)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5242)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4040)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3909)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3863)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3772)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3698)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3664)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3590)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3500)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3457)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2933)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2336)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:973)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:748)
>         at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:549)
>         at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:290)
>         ... 6 more
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - Invalid alias: i in {t:
(i: integer,d: double,c: chararray)}
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Invalid
alias: i in {t: (i: integer,d: double,c: chararray)}
> grunt> c = foreach b generate b.t;
> grunt> describe c;
> c: {t: {i: integer,d: double,c: chararray}}
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message