hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-111) Configuration of Pig
Date Thu, 06 Mar 2008 10:18:58 GMT

    [ https://issues.apache.org/jira/browse/PIG-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575645#action_12575645
] 

Pi Song commented on PIG-111:
-----------------------------

Regarding this bit,
{quote}
*Alan Gates:*
Also, I was pondering how to include hadoop specific data here. Right now pig attaches to
a cluster by reading a hadoop config file. Obviously we don't want this in our general config
file. But maybe the file to be referenced should be referred to in this config file. Or maybe
it's ok to just pick the hadoop-site.xml up off the classpath, as is done now. The modification
would then be that we only do it if we're in mapreduce mode. Thoughts on this?
{quote}

This is just throwing out my idea.
I propose "Convention-over-Configuration". 

*General Config*
- Pig looks for the generic config file in classpath (name is hardcoded)
- User can override the filename/path in startup command line

*Backend Specific Config*
- Backend looks for the default filename in classpath (name is hardcoded)
- User can override the filename/path in the generic config file or specified in startup command
line.
- Of course, we pass general config key-values to Backend so Backend can also formulate some
key-values from general config
- (Obvious case) If user specify wrong config key sets for the running backend, we just ignore

Moreover, I think the interactions between Common-Logging and Log4J is exactly the same as
what we're trying to achieve. We might just look at it and apply the same concept.

BTW Stefan, thanks for your hard work. I really appreciate that (because I had the same experience
fixing a big patch over and over again before).



> Configuration of Pig
> --------------------
>
>                 Key: PIG-111
>                 URL: https://issues.apache.org/jira/browse/PIG-111
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Craig Macdonald
>            Assignee: Stefan Groschupf
>         Attachments: after.png, before.png, config.patch.1502, PIG-111-v04.patch, PIG-111-v05.patch,
PIG-111-v06.patch, PIG-111_v_3_sg.patch, PIG-111_v_7_r633244M.patch, PIG-111_v_8_r633244M.patch,
PIG-93-v01.patch, PIG-93-v02.patch
>
>
> This JIRA discusses issues relating to the configuration of Pig.
> Uses cases:
>  
> 1. I want to configure Pig programatically from Java
>  Motivation: pig can be embedded from another Java program, and configuration should
be accessible to be set by the client code
> 2. I want to configure Pig from the command line
> 3. I want to configure Pig from the Pig shell (Grunt)
> 4. I want Pig to remember my configuration for every Pig session
>  Motivation: to save me typing in some configuration stuff every time.
> 5. I want Pig to remember my configuration for this script.
>  Motivation: I must use a common configuration for 50% of my Pig scripts - can I share
this configuration between scripts.
> Current Status: 
>  * Pig uses System properties for some configuration
>  * A configuration properties object in PigContext is not used.
>  * pigrc can contain properties
>  * Configuration properties can not be set from Grunt
> Proposed solutions to use cases:
> 1. Configuration should be set in PigContext, and accessible from client code.
> 2. System properties are copied to PigContext, or can be specified on the command line
(duplication with System properties)
> 3. Allow configuration properties to be set using the "set" command in Grunt
> 4. Pigrc can contain properties. Is this enough, or can other configuration stuff be
set, eg aliases, imports, etc.
> 5. Add an include directive to pig, to allow a shared configuration/Pig script to be
included.
> Connections to Shell scripting: 
>  * The source command in Bash allows another bash script file to be included - this allows
shared variables to be set in one file shared between a set of scripts.
>  * Aliases can be set, according to user preferences, etc.
>  * All this can be done in your .bashrc file
> Issues: 
>  * What happens when you change a property after the property has been read?
>  * Can Grunt read a pigrc containing various statements etc before the PigServer is completely
configured?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message