pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-58) parameterized Pig scripts
Date Tue, 05 Feb 2008 23:29:08 GMT

    [ https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565959#action_12565959

Olga Natkovich commented on PIG-58:

Comments from a user:

1.  The parameter usage notation seems to arcane.
      Why both %% and $?   Why both %% and ``?
      It looks like you need %..% only to be able to put a command inside them -- they (%..%)
do not seem necessary for variable substitutions.
      Similarly, `..` with this meaning can only be right inside %...%
      So why don't you simplify it to the following:
Example 1: parameter name specification
A = load '/data/mydata/$date';
Example 2: command specification
A = load '/data/mydata/%generate_date%';
Example 3: command taking a parameter
A = load '/data/mydata/%generate_name $date%';
Example 4: command is passed as a parameter:
A = load '/data/mydata/%$cmd $date%';

I would not want to write
B = filter A by $0>'%$N%';
(assuming I have N=5 in the command line...)
I'd prefer
B = filter A by $0>$N;

2. (repeated from another message)
"-param" key seems redundant and will cause confusion.
     Why not assume that any command line argument that has = is a parameter specification?
 Indeed, we can get rid of -option syntax at all.
     Another way to think about this:  "X=Y" and "-X Y" are two ways to specify parameters.
 Use one.

3. We need an ability to specify "DFS working directory" for the whole Pig job.
   This is the directory where all the relative Record Set path names (in "load" and "store")
are rooted.
   By default, it is the users "home DFS directory".
   It is very convenient to be able to specify it when a script is run.

4. Need to specify the precedence rules: if for the same parameter name 3 values are specified
-- in on command line, in "declare", and in parameters file, which one wins?  (probable, the
order is command line, parameters file, declare ?)

5.  Can parameters be used in RHS of declare?  The documents implicitly says "NO",  but is
a very convenient feature.   (I could give examples, but you can come up with plenty on your

6. Typo:
The fault parameter values can be specified

> parameterized Pig scripts
> -------------------------
>                 Key: PIG-58
>                 URL: https://issues.apache.org/jira/browse/PIG-58
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
> This feature has been requested by several users and would be very useful in conjunction
with streaming. The feature would allow pig script to include parameters that are replaced
at run time. For instance, if your script needs to run on a daily basis over the data of the
previous day, you would be able to use the script and providing a date as a run-time parameter
to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val>
construct. Multiple parameters can be specified. They are applied to the script in the order
they are specified on the command line
> (2) Default values for the parameters can be specified within the script via decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by Main before
grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %<identifie>% look for a match in the hash. Replace,
if found.  Write the line to the temp file.
> - pass the temp file to grunt.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message