pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: More streamlined schema definition syntax ?
Date Wed, 28 May 2008 07:32:15 GMT
1) If we want to allow bags of anything in the future, there are things like


which is already accessing tuple content directly. I'm not sure if there are
also others.

2) For the perl bit, I think "optionally" omitting bag keyword is not a
problem. We already have different brackets meaning different things: ( ) ,
{   } , [   ]. That's why I think forcing TUPLE, BAG, MAP is redundant.


On 5/28/08, Alan Gates <gates@yahoo-inc.com> wrote:
> A couple of thoughts:
> The issue with removing the tuple keyword from bag definition, so we can
> have bag: {a: int} instead of bag: {tuple: (a: int)}, is we had discussed
> allowing bags to be bags of anything, instead of bags of tuples.  We aren't
> doing anything about that now, but we might in the future.  We would have to
> change the semantics on bag type declaration if we made that change.
>  Otherwise we would not know whether bag {a: int} meant that we had a bag of
> tuples of one element or a bag of ints.
> As for letting {} alone mean bag, I'm concerned pig latin will end up like
> perl, where different brackets mean different things and it's hard to read
> the code.  The other extreme is ending up like sql where it takes way too
> many keywords to do something.  I'm open to others views on this.
> Alan.
> pi song wrote:
>> Here is what I know:-
>> Tuple Schema = schema associated with "a" tuple
>> Bag Schema = schema of all tuples contained in a bag
>> Then, here is the current way to specify schema in PigType branch:-
>> A = LOAD 'file1' AS (fieldA: bag
>> {tuple1:tuple(a:int,b:long,c:float,d:double)}, fieldB: Int)
>> Isn't this inefficient? Since we have already agreed that a bag only
>> contains tuples, not datum, I think it would be better if users can do
>> just:-
>> A = LOAD 'file1' AS (fieldA: bag {a:int,b:long,c:float,d:double}, fieldB:
>> Int)
>> Or even better, due to the fact that the curly braces already indicate Bag
>> data type:-
>> A = LOAD 'file1' AS (fieldA: {a:int,b:long,c:float,d:double}, fieldB: Int)
>> So potentially I think the keyword "Bag" should be optional for
>> convenience.
>> This is the same as when we specify tuple schema which is already
>> indicated
>> by round brackets.
>> Any opinion? It's now time to make it easy for users.
>> Pi
>> PS. I'm willing to make the change if everybody is too busy.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message