pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag
Date Fri, 14 Jan 2011 06:25:46 GMT

    [ https://issues.apache.org/jira/browse/PIG-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981642#action_12981642

Ashutosh Chauhan commented on PIG-847:

bq. I propose to remove twoLevelAccess, all bag implicitly contain tuple, and bag projection
implicitly goes to the item inside tuple. 
+1 for removal of twoLevelAccess and all the confusion it results in. Will this decision has
any bearing on bags having other types? People have suggested for having a datatype for a
collection of objects (like integer, long etc.) If we mandate that bags necessarily contain
tuples, are we eliminating the possibility of implementing bags containing other types? 

> Setting twoLevelAccessRequired field in a bag schema should not be required to access
fields in the tuples of the bag
> ---------------------------------------------------------------------------------------------------------------------
>                 Key: PIG-847
>                 URL: https://issues.apache.org/jira/browse/PIG-847
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
> Currently Pig interprets the result type of a relation as a bag. However the schema of
the relation directly contains the schema describing the fields in the tuples for the relation.
However when a udf wants to return a bag or if there is a bag in input data or if the user
creates a bag constant, the schema of the bag has one field schema which is that of the tuple.
The Tuple's schema has the types of the fields. To be able to access the fields from the bag
directly in such a case by using something like <bagname>.<fieldname> or <bag>.<fieldposition>,
the schema of the bag should have the twoLevelAccess set to true so that pig's type system
can get traverse the tuple schema and get to the field in question. This is confusing - we
should try and see if we can avoid needing this extra flag. A possible solution is to treat
bags the same way - whether they represent relations or real bags. Another way is to introduce
a special "relation" datatype for the result type of a relation and bag type would be used
only for true bags. In this case, we would always need bag schema to have a tuple schema which
would describe the fields. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message