pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-58) parameterized Pig scripts
Date Tue, 08 Apr 2008 16:41:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586876#action_12586876

Alan Gates commented on PIG-58:


1) Dryrun isn't a good name for the command line option.  It's far to generic.
Knowing nothing about parameter substitution and looking at the usage
statement of pig I'd assume that dryrun meant that pig was going to parse my
query but not run it.  I would suggest a name like preproc or preproconly.

2) Why did we choose to put all of the data for the unit tests in separate
files in a test/data directory?  To date the approach has been to have the
tests themselves generate the data they need on the fly.  There are pros and
cons to switching, but I think we should discuss and have a policy of how unit
tests handle their data before we start adding a directory with a lot of files
in it.

3) A number of the public functions do not have javadoc comments.

4) In general more comments throughout the code on what it is doing would be
helpful.  For example, in UtilFunctions.substitute, it is totally non-obvious
what the line "replaced_line = replaced_line.replaceAll("\\\\\\$","\\$");

5) PigFileParser.jj doesn't skip over commented lines in the pig code.  It should ignore anything
on a line after --

> parameterized Pig scripts
> -------------------------
>                 Key: PIG-58
>                 URL: https://issues.apache.org/jira/browse/PIG-58
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: PIG-58_v1.patch, PIG-58_v2
> This feature has been requested by several users and would be very useful in conjunction
with streaming. The feature would allow pig script to include parameters that are replaced
at run time. For instance, if your script needs to run on a daily basis over the data of the
previous day, you would be able to use the script and providing a date as a run-time parameter
to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val>
construct. Multiple parameters can be specified. They are applied to the script in the order
they are specified on the command line
> (2) Default values for the parameters can be specified within the script via decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by Main before
grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %<identifie>% look for a match in the hash. Replace,
if found.  Write the line to the temp file.
> - pass the temp file to grunt.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message