hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject More streamlined schema definition syntax ?
Date Sun, 25 May 2008 10:07:22 GMT
Here is what I know:-

Tuple Schema = schema associated with "a" tuple
Bag Schema = schema of all tuples contained in a bag

Then, here is the current way to specify schema in PigType branch:-

A = LOAD 'file1' AS (fieldA: bag
{tuple1:tuple(a:int,b:long,c:float,d:double)}, fieldB: Int)

Isn't this inefficient? Since we have already agreed that a bag only
contains tuples, not datum, I think it would be better if users can do

A = LOAD 'file1' AS (fieldA: bag {a:int,b:long,c:float,d:double}, fieldB:

Or even better, due to the fact that the curly braces already indicate Bag
data type:-

A = LOAD 'file1' AS (fieldA: {a:int,b:long,c:float,d:double}, fieldB: Int)

So potentially I think the keyword "Bag" should be optional for convenience.
This is the same as when we specify tuple schema which is already indicated
by round brackets.

Any opinion? It's now time to make it easy for users.


PS. I'm willing to make the change if everybody is too busy.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message