hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: yahoo-specific pig - improvements to syntax and .pigrc
Date Fri, 15 Feb 2008 14:20:36 GMT
Hi Alan et al,

comments through.

> Here's my thinking on this, though I don't speak for all of the 
> committers.  Pig should have 3 ways to pick up configuration:
>
> 1) from .pigrc, as it does now
agreed
> 2) when embedded in another java program, the caller should be able to 
> set values in PigContext, as I referred to in my response to 
> Benjamin's email.
agreed
> 3) From the pig script, we should be able to something like:  set 
> conf.x = y (I'm not necessarily suggesting syntax here).
Ok, start of a patch for this attached.
I wasn't sure if the "conf." prefix is required, so patch has this as a 
comment.
>
>> * Should .pigrc evolve into a place for Pig aliases and properties, 
>> and even scripts? (similar to .bashrc etc)
> Right now you can store pig properties here.  It's not clear it needs 
> to grow beyond that.  What use case do you see for storing aliases or 
> scripts here?
(motivations below)
>> * Should new commands be added: import, include, sharedFS etc?
> I'm guessing this is the same things as I'm saying in 3 above.  If 
> not, please elaborate on what these new commands would do.
I'm trying to envisage how to allow reusability in the Pig.
These are mostly from my original email.
>>>
>>> 2. Extensions to Pig syntax
>>> (a) "set" command sets all system properties
>>> (b) "include" includes and parses another pig script
>>> (c) "import" adds a package namespace to the search path
>>>
>>> 3. Change so that ~/.pigrc into a pig script that is parsed on 
>>> startup of Grunt/PigServer?

(1) Why should a user have to supply the fully qualified name to his 
user defined function, if all the functions he ever uses are in that 
package? Obviously, he shouldnt have to, which is why PigContext 
includes this line:
    packageImportList.add("com.yahoo.pig.yst.sds.ULT.");

I'm asking to add a command that allows me to do something like:
    import uk.ac.gla.terrier.pig
and have that package searched for any functions. Yahoo users have this 
ability (com.yahoo.pig.yst.sds.ULT. is searched by default), why not 
everyone else ;-)
[Somewhat similar to the define keyword.]

These could be properties instead.

(2) Include other pig files. Just to allow commonly created imports, 
configuration, defines, etc to be easily loaded. How often do you 
register the same jar files time-in time-out for every pig script that 
your write.

(3) sharedFS - see PIG-102 - equally could be a property too.

(4) pigrc as a script - similar to (2).
This is like your Unix shell rc, eg .bashrc
Mine is full of single character aliases for commands I use all the 
time, etc.

C

Mime
View raw message