hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Groschupf (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-111) Configuration of Pig
Date Wed, 27 Feb 2008 00:02:51 GMT

     [ https://issues.apache.org/jira/browse/PIG-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Stefan Groschupf updated PIG-111:

    Attachment: PIG-111_v_3_sg.patch

Here is my suggestion. I'm sorry it got a little bigger as expected. I know it is painful
to review and discuss larger patches but I think it configuration is a very important part.
Hadoop had big  problems in the beginning since they made some mistakes early (the configuration
was static). In general I'm very much in favor of Inversion of Control and Constructor based
injection of concrete configuration values but I understand that in case we want to switch
Execution Engine implementations easily we cant do that. 

I noticed that pig actually only works since all configuration values are set as system properties
and than read back as system properties. (System.getProperty). I made very bad experience
using system properties in production environments since it is not clear to the user what
the values are. Then you run a job taking a week on the wrong cluster and all services are

>From my point of view this are the important points:
+ the configuration object itself has no dependencies - java.util.Properies would be the best
choice from my point of view
+ the configuration is not static so we pass an properties instance around and do not use
system properties at all.
+ each Execution Engine implementation has to take care itself about converting properties
into a format the underlaying technology understand (properties to hadoop configuration)
+ a default properties configuration file is part of our distribution (PIG_HOME/conf) and
contains all possible configuration values (for documentation) but maybe do only set required
values by default

The attached patch implements those points. I had to change some API- I'm very sorry but I
personal think it is cleaner now. I also had to adjust the tests.
I suggest to apply the patch and review the changed sources instead of reading the patch file.
Fore sue this is just the starting point and we need furthure improvement in the sources -
e.g. I suggest Grunt allows to set all kind of properties not just known once.

The patch is based on the patches done before for this issue.
Patch is against r631358. At least on my box the test suite is successfully. 

> Configuration of Pig
> --------------------
>                 Key: PIG-111
>                 URL: https://issues.apache.org/jira/browse/PIG-111
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Craig Macdonald
>         Attachments: after.png, before.png, config.patch.1502, PIG-111_v_3_sg.patch,
PIG-93-v01.patch, PIG-93-v02.patch
> This JIRA discusses issues relating to the configuration of Pig.
> Uses cases:
> 1. I want to configure Pig programatically from Java
>  Motivation: pig can be embedded from another Java program, and configuration should
be accessible to be set by the client code
> 2. I want to configure Pig from the command line
> 3. I want to configure Pig from the Pig shell (Grunt)
> 4. I want Pig to remember my configuration for every Pig session
>  Motivation: to save me typing in some configuration stuff every time.
> 5. I want Pig to remember my configuration for this script.
>  Motivation: I must use a common configuration for 50% of my Pig scripts - can I share
this configuration between scripts.
> Current Status: 
>  * Pig uses System properties for some configuration
>  * A configuration properties object in PigContext is not used.
>  * pigrc can contain properties
>  * Configuration properties can not be set from Grunt
> Proposed solutions to use cases:
> 1. Configuration should be set in PigContext, and accessible from client code.
> 2. System properties are copied to PigContext, or can be specified on the command line
(duplication with System properties)
> 3. Allow configuration properties to be set using the "set" command in Grunt
> 4. Pigrc can contain properties. Is this enough, or can other configuration stuff be
set, eg aliases, imports, etc.
> 5. Add an include directive to pig, to allow a shared configuration/Pig script to be
> Connections to Shell scripting: 
>  * The source command in Bash allows another bash script file to be included - this allows
shared variables to be set in one file shared between a set of scripts.
>  * Aliases can be set, according to user preferences, etc.
>  * All this can be done in your .bashrc file
> Issues: 
>  * What happens when you change a property after the property has been read?
>  * Can Grunt read a pigrc containing various statements etc before the PigServer is completely

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message