hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath" <prade...@yahoo-inc.com>
Subject RE: two-level access problem?
Date Tue, 03 Nov 2009 21:38:08 GMT
The twoLevelAccessRequired flag is not quite a long term solution to the problem. The problem
is that we treat output of relations to be bags but their schemas do NOT have twoLevelAccessRequired
to be true. Only bag constants and bags from input data have this flag set to true. We need
to move to either *all* bag schemas having a tuple schema with the real schema which reflects
the layout of the bag or think of an alternative. Implementing the solution may have many
more details which will need to be looked at. This flag should be removed and should not be
needed once we arrive at a solution. Otherwise Resource Schema would also need to have this
notion of two level access for bag fields.


-----Original Message-----
From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com] 
Sent: Tuesday, November 03, 2009 12:30 PM
To: pig-dev@hadoop.apache.org
Subject: Re: two-level access problem?

Thanks Pradeep,
I saw that comment. I guess my question is, given the solution this
comment describes, what are you referring to in the Load/Store
redesign doc when you say "we must fix the two level access issues
with schema of bags in current schema before we make these changes,
otherwise that same contagion will afflict us here?"


On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath <pradeepk@yahoo-inc.com> wrote:
> From comments in Schema.java:
>    // In bags which have a schema with a tuple which contains
>    // the fields present in it, if we access the second field (say)
>    // we are actually trying to access the second field in the
>    // tuple in the bag. This is currently true for two cases:
>    // 1) bag constants - the schema of bag constant has a tuple
>    // which internally has the actual elements
>    // 2) When bags are loaded from input data, if the user
>    // specifies a schema with the "bag" type, he has to specify
>    // the bag as containing a tuple with the actual elements in
>    // the schema declaration. However in both the cases above,
>    // the user can still say b.i where b is the bag and i is
>    // an element in the bag's tuple schema. So in these cases,
>    // the access should translate to a lookup for "i" in the
>    // tuple schema present in the bag. To indicate this, the
>    // flag below is used. It is false by default because,
>    // currently we use bag as the type for relations. However
>    // the schema of a relation does NOT have a tuple fieldschema
>    // with items in it. Instead, the schema directly has the
>    // field schema of the items. So for a relation "b", the
>    // above b.i access would be a direct single level access
>    // of i in b's schema. This is treated as the "default" case
>    private boolean twoLevelAccessRequired = false;
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: Monday, November 02, 2009 5:33 PM
> To: pig-dev@hadoop.apache.org
> Subject: two-level access problem?
> Could someone explain the nature of the "two-level access problem"
> referred to in the Load/Store redesign wiki and in the DataType code?
> Thanks,
> -D

View raw message