pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From o...@apache.org
Subject svn commit: r1050082 [3/6] - in /pig/trunk: ./ src/docs/src/documentation/content/xdocs/
Date Thu, 16 Dec 2010 18:10:59 GMT
Added: pig/trunk/src/docs/src/documentation/content/xdocs/cmds.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/cmds.xml?rev=1050082&view=auto
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/cmds.xml (added)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/cmds.xml Thu Dec 16 18:10:59 2010
@@ -0,0 +1,634 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Shell and Utility Commands</title>
+  </header>
+  <body>
+
+<!-- ====================================================================== -->
+<!-- Shell COMMANDS-->
+   <section>
+   <title>Shell Commands</title>
+   
+      <section>
+   <title>fs</title>
+   <p>Invokes any FSShell command from within a Pig script or the Grunt shell.</p>
+   
+   <section>
+   <title>Syntax </title>
+   <table>
+       <tr>
+            <td>
+               <p>fs subcommand subcommand_parameters </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+      <tr>
+            <td>
+               <p>subcommand</p>
+            </td>
+            <td>
+               <p>The FSShell command.</p>
+            </td>
+         </tr>
+               <tr>
+            <td>
+               <p>subcommand_parameters</p>
+            </td>
+            <td>
+               <p>The FSShell command parameters.</p>
+            </td>
+         </tr>
+   </table>
+   
+   </section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Use the fs command to invoke any FSShell command from within a Pig script or
Grunt shell. 
+   The fs command greatly extends the set of supported file system commands and the capabilities
+   supported for existing commands such as ls that will now support globing. For a complete
list of
+   FSShell commands, see 
+   <a href="http://hadoop.apache.org/common/docs/current/file_system_shell.html">File
System Shell Guide</a></p>
+   </section>
+   
+   <section>
+   <title>Examples</title>
+   <p>In these examples a directory is created, a file is copied, a file is listed.</p>
+<source>
+fs -mkdir /tmp
+fs -copyFromLocal file-x file-y
+fs -ls file-y
+</source>
+   </section>
+       </section>  
+     <section>
+   <title>sh</title>
+   <p>Invokes any sh shell command from within a Pig script or the Grunt shell.</p>
+   
+   <section>
+   <title>Syntax </title>
+   <table>
+       <tr>
+            <td>
+               <p>sh subcommand subcommand_parameters </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+      <tr>
+            <td>
+               <p>subcommand</p>
+            </td>
+            <td>
+               <p>The sh shell command.</p>
+            </td>
+         </tr>
+               <tr>
+            <td>
+               <p>subcommand_parameters</p>
+            </td>
+            <td>
+               <p>The sh shell command parameters.</p>
+            </td>
+         </tr>
+   </table>
+   
+   </section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Use the sh command to invoke any sh shell command from within a Pig script or
Grunt shell.</p>
+   
+<p> 
+ Note that only real programs can be run form the sh command. Commands such as cd are not
programs 
+ but part of the shell environment and as such cannot be executed unless the user invokes
the shell explicitly, like "bash cd".
+</p>
+   </section>
+   
+   <section>
+   <title>Example</title>
+   <p>In this example the ls command is invoked.</p>
+<source>
+grunt> sh ls 
+bigdata.conf 
+nightly.conf 
+..... 
+grunt> 
+</source>
+   </section>
+ 
+    </section>
+        </section>
+ 
+ <!-- ======================================================== -->         
+        
+   <section>
+   <title>Utility Commands</title>
+   
+  <section>
+   <title>exec</title>
+   <p>Run a Pig script.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>exec [–param param_name = param_value] [–param_file file_name]
script  </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+    
+        <tr>
+            <td>
+               <p>–param param_name = param_value</p>
+            </td>
+            <td>
+               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>.</p>
+            </td>
+        </tr>
+
+        <tr>
+            <td>
+               <p>–param_file file_name</p>
+            </td>
+            <td>
+               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>.
</p>
+            </td>
+        </tr>
+   
+      <tr>
+            <td>
+               <p>script</p>
+            </td>
+            <td>
+               <p>The name of a Pig script.</p>
+            </td>
+         </tr>
+         
+    
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Use the exec command to run a Pig script with no interaction between the script
and the Grunt shell (batch mode). Aliases defined in the script are not available to the shell;
however, the files produced as the output of the script and stored on the system are visible
after the script is run. Aliases defined via the shell are not available to the script. </p>
+   <p>With the exec command, store statements will not trigger execution; rather, the
entire script is parsed before execution starts. Unlike the run command, exec does not change
the command history or remembers the handles used inside the script. Exec without any parameters
can be used in scripts to force execution up to the point in the script where the exec occurs.
</p>
+   <p>For comparison, see the run command. Both the exec and run commands are useful
for debugging because you can modify a Pig script in an editor and then rerun the script in
the Grunt shell without leaving the shell. Also, both commands promote Pig script modularity
as they allow you to reuse existing components.</p>
+   </section>
+   
+   <section>
+   <title>Examples</title>
+   <p>In this example the script is displayed and run.</p>
+
+<source>
+grunt&gt; cat myscript.pig
+a = LOAD 'student' AS (name, age, gpa);
+b = LIMIT a 3;
+DUMP b;
+
+grunt&gt; exec myscript.pig
+(alice,20,2.47)
+(luke,18,4.00)
+(holly,24,3.27)
+</source>
+
+   <p>In this example parameter substitution is used with the exec command.</p>
+<source>
+grunt&gt; cat myscript.pig
+a = LOAD 'student' AS (name, age, gpa);
+b = ORDER a BY name;
+
+STORE b into '$out';
+
+grunt&gt; exec –param out=myoutput myscript.pig
+</source>
+
+      <p>In this example multiple parameters are specified.</p>
+<source>
+grunt&gt; exec –param p1=myparam1 –param p2=myparam2 myscript.pig
+</source>
+
+   </section>
+   
+   </section>   
+   
+   
+   <section>
+   <title>help</title>
+   <p>Prints a list of Pig commands or properties.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>-help [properties]  </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>properties</p>
+            </td>
+            <td>
+               <p>List Pig properties.</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>The help command prints a list of Pig commands or properties.</p></section>
+   
+   <section>
+   <title>Example</title>
+   <p>Use "-help" to get a list of commands.</p>
+<source>
+$ pig -help
+
+Apache Pig version 0.8.0-dev (r987348)
+compiled Aug 19 2010, 16:38:44
+
+USAGE: Pig [options] [-] : Run interactively in grunt shell.
+       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
+       Pig [options] [-f[ile]] file : Run cmds found in file.
+  options include:
+    -4, -log4jconf - Log4j configuration file, overrides log conf
+    -b, -brief - Brief logging (no timestamps)
+    -c, -check - Syntax check
+<em>etc …</em></source>
+
+<p>Use "-help properties" to get a list of properties.</p>
+<source>
+$ pig -help properties
+
+The following properties are supported:
+    Logging:
+        verbose=true|false; default is false. This property is the same as -v switch
+        brief=true|false; default is false. This property is the same as -b switch
+        debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO. This property is the same as -d
switch
+        aggregate.warning=true|false; default is true. If true, prints count of warnings
+            of each type rather than logging each warning.
+<em>etc …</em></source>
+
+   </section>
+   </section>
+   
+   <section>
+   <title>kill</title>
+   <p>Kills a job.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>kill jobid</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>jobid</p>
+            </td>
+            <td>
+               <p>The job id.</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>The kill command enables you to kill a job based on a job id.</p></section>
+   
+   <section>
+   <title>Example</title>
+   <p>In this example the job with id job_0001 is killed.</p>
+<source>
+grunt&gt; kill job_0001
+</source>
+   </section></section>
+   
+   <section>
+   <title>quit</title>
+   <p>Quits from the Pig grunt shell.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>exit</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>none</p>
+            </td>
+            <td>
+               <p>no parameters</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>The quit command enables you to quit or exit the Pig grunt shell.</p></section>
+   
+   <section>
+   <title>Example</title>
+   <p>In this example the quit command exits the Pig grunt shall.</p>
+<source>
+grunt&gt; quit
+</source>
+   </section></section>
+   
+   
+   <section>
+   <title>run</title>
+   <p>Run a Pig script.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>run [–param param_name = param_value] [–param_file file_name]
script </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+    
+         <tr>
+            <td>
+               <p>–param param_name = param_value</p>
+            </td>
+            <td>
+               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>.</p>
+            </td>
+         </tr>
+
+         <tr>
+            <td>
+               <p>–param_file file_name</p>
+            </td>
+            <td>
+               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>.
</p>
+            </td>
+         </tr>
+      <tr>
+            <td>
+               <p>script</p>
+            </td>
+            <td>
+               <p>The name of a Pig script.</p>
+            </td>
+         </tr>
+         
+    
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Use the run command to run a Pig script that can interact with the Grunt shell
(interactive mode). The script has access to aliases defined externally via the Grunt shell.
The Grunt shell has access to aliases defined within the script. All commands from the script
are visible in the command history. </p>   
+	<p>With the run command, every store triggers execution. The statements from the script
are put into the command history and all the aliases defined in the script can be referenced
in subsequent statements after the run command has completed. Issuing a run command on the
grunt command line has basically the same effect as typing the statements manually. </p>
  
+   <p>For comparison, see the exec command. Both the run and exec commands are useful
for debugging because you can modify a Pig script in an editor and then rerun the script in
the Grunt shell without leaving the shell. Also, both commands promote Pig script modularity
as they allow you to reuse existing components.</p>
+  </section>
+   
+   <section>
+   <title>Example</title>
+   <p>In this example the script interacts with the results of commands issued via
the Grunt shell.</p>
+<source>
+grunt&gt; cat myscript.pig
+b = ORDER a BY name;
+c = LIMIT b 10;
+
+grunt&gt; a = LOAD 'student' AS (name, age, gpa);
+
+grunt&gt; run myscript.pig
+
+grunt&gt; d = LIMIT c 3;
+
+grunt&gt; DUMP d;
+(alice,20,2.47)
+(alice,27,1.95)
+(alice,36,2.27)
+</source>
+   
+   
+   <p>In this example parameter substitution is used with the run command.</p>
+<source>
+grunt&gt; a = LOAD 'student' AS (name, age, gpa);
+
+grunt&gt; cat myscript.pig
+b = ORDER a BY name;
+STORE b into '$out';
+
+grunt&gt; run –param out=myoutput myscript.pig
+</source>
+   
+   </section></section>   
+   
+
+   <section>
+   <title>set</title>
+   <p>Assigns values to keys used in Pig.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>set key 'value'</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>key</p>
+            </td>
+            <td>
+               <p>Key (see table). Case sensitive.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>value</p>
+            </td>
+            <td>
+               <p>Value for key (see table). Case sensitive.</p>
+            </td>
+         </tr> 
+   </table>
+   </section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Use the set command to assign values to keys, as shown in the table. All keys
and their corresponding values (for Pig and Hadoop) are case sensitive.  </p>
+
+   <table>
+       
+      <tr>
+            <td>
+               <p>Key </p>
+            </td>
+            <td>
+               <p>Value </p>
+            </td>
+            <td>
+               <p>Description </p>
+            </td>
+         </tr>
+                     <tr>
+            <td>
+               <p>default_parallel</p>
+            </td>
+            <td>
+               <p>a whole number </p>
+            </td>
+            <td>
+               <p>Sets the number of reducers for all MapReduce jobs generated by Pig

+              (see  <a href="perf.html#Use+the+Parallel+Features">Use the Parallel
Features</a>).</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>debug </p>
+            </td>
+            <td>
+               <p>on/off </p>
+            </td>
+            <td>
+               <p>Turns debug-level logging on or off. </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>job.name </p>
+            </td>
+            <td>
+               <p>Single-quoted string that contains the job name.</p>
+            </td>
+            <td>
+               <p>Sets user-specified name for the job </p>
+            </td>
+            </tr>
+
+         <tr>
+            <td>
+               <p>job.priority </p>
+            </td>
+            <td>
+               <p>Acceptable values (case insensitive): very_low, low, normal, high,
very_high </p>
+            </td>
+            <td>
+               <p>Sets the priority of a Pig job.</p>
+            </td>
+            </tr>
+
+         <tr>
+            <td>
+               <p>stream.skippath</p>
+            </td>
+            <td>
+               <p>String that contains the path.</p>
+            </td>
+            <td>
+               <p>For streaming, sets the path from which not to ship data (see <a
href="basic.html#DEFINE">DEFINE</a> and <a href="basic.html#autoship"> About
Auto-Ship</a>).</p>
+            </td>
+            </tr>
+
+          
+   </table>
+   <p></p>
+   
+   <p>
+All Pig and Hadoop properties can be set, either in the Pig script or via the Grunt command
line.
+   </p>
+   </section>
+   
+   <section>
+   <title>Examples</title>
+   <p>In this example key value pairs are set at the command line.</p>
+<source>
+grunt&gt; SET debug 'on'
+grunt&gt; SET job.name 'my job'
+grunt&gt; SET default_parallel 100
+</source>
+
+<p>In this example default_parallel is set in the Pig script; all MapReduce jobs that
get launched will use 20 reducers.</p>
+<source>
+SET default_parallel 20;
+A = LOAD 'myfile.txt' USING PigStorage() AS (t, u, v);
+B = GROUP A BY t;
+C = FOREACH B GENERATE group, COUNT(A.t) as mycount;
+D = ORDER C BY mycount;
+STORE D INTO 'mysortedcount' USING PigStorage();
+</source>
+
+
+<p>In this example multiple key value pairs are set in the Pig script. These key value
pairs are put in job-conf by Pig (making the pairs available to Pig and Hadoop). This is a
script-wide setting; if a key value is defined multiple times in the script the last value
will take effect and will be set for all jobs generated by the script. </p>
+<source>
+...
+SET mapred.map.tasks.speculative.execution false; 
+SET pig.logfile mylogfile.log; 
+SET my.arbitrary.key my.arbitary.value; 
+...
+</source>
+</section>
+</section>
+</section>
+
+
+  </body>
+</document>

Added: pig/trunk/src/docs/src/documentation/content/xdocs/cont.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/cont.xml?rev=1050082&view=auto
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/cont.xml (added)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/cont.xml Thu Dec 16 18:10:59 2010
@@ -0,0 +1,341 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+  <header>
+    <title>Control Structures</title>
+  </header>
+  <body>
+ <!-- ++++++++++++++++++++++++++++++++++ -->    
+   <section>
+   <title>Parameter Substitution</title>
+   <section>
+   <title>Description</title>
+   <p>Substitute values for parameters at run time.</p>
+   
+   <section>
+   <title>Syntax: Specifying parameters using the Pig command line</title>
+   <table>
+      <tr>
+            <td>
+               <p>pig {–param param_name = param_value | –param_file file_name}
[-debug | -dryrun] script</p>
+            </td>
+         </tr>
+   </table>
+   </section>
+   
+   <section>
+   <title>Syntax: Specifying parameters using preprocessor statements in a Pig script</title>
+   <table>
+      <tr>
+            <td>
+               <p>{%declare | %default} param_name param_value</p>
+            </td>
+         </tr>
+   </table>
+   </section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+      <tr>
+            <td>
+               <p>pig</p>
+            </td>
+            <td>
+               <p>Keyword</p>
+               <p>Note: exec, run, and explain also support parameter substitution.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>–param</p>
+            </td>
+            <td>
+               <p>Flag. Use this option when the parameter is included in the command
line.</p>
+               <p>Multiple parameters can be specified. If the same parameter is specified
multiple times, the last value will be used and a warning will be generated.</p>
+               <p>Command line parameters and parameter files can be combined with
command line parameters taking precedence. </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>param_name</p>
+            </td>
+            <td>
+               <p>The name of the parameter.</p>
+               <p>The parameter name has the structure of a standard language identifier:
it must start with a letter or underscore followed by any number of letters, digits, and underscores.
</p>
+               <p>Parameter names are case insensitive. </p>
+               <p>If you pass a parameter to a script that the script does not use,
this parameter is silently ignored. If the script has a parameter and no value is supplied
or substituted, an error will result.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>param_value</p>
+            </td>
+            <td>
+               <p>The value of the parameter. </p>
+               <p>A parameter value can take two forms:</p>
+               <ul>
+                  <li>
+                     <p>A sequence of characters enclosed in single or double quotes.
In this case the unquoted version of the value is used during substitution. Quotes within
the value can be escaped with the backslash character ( \ ). Single word values that don't
use special characters such as % or = don't have to be quoted. </p>
+                  </li>
+                  <li>
+                     <p>A command enclosed in back ticks. </p>
+                  </li>
+               </ul>
+               <p>The value of a parameter, in either form, can be expressed in terms
of other parameters as long as the values of the dependent parameters are already defined.</p>
+               <p>There are no hard limits on the size except that parameters need
to fit into memory.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>–param_file</p>
+            </td>
+            <td>
+               <p>Flag. Use this option when the parameter is included in a file. </p>
+               <p>Multiple files can be specified. If the same parameter is present
multiple times in the file, the last value will be used and a warning will be generated. If
a parameter present in multiple files, the value from the last file will be used and a warning
will be generated.</p>
+               <p>Command line parameters and parameter files can be combined with
command line parameters taking precedence. </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>file_name</p>
+            </td>
+            <td>
+               <p>The name of a file containing one or more parameters.</p>
+               <p>A parameter file will contain one line per parameter. Empty lines
are allowed. Perl-style (#) comment lines are also allowed. Comments must take a full line
and # must be the first character on the line. Each parameter line will be of the form: param_name
= param_value. White spaces around = are allowed but are optional.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>–debug</p>
+            </td>
+            <td>
+               <p>Flag. With this option, the script is run and a fully substituted
Pig script produced in the current working directory named original_script_name.substituted
</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>–dryrun</p>
+            </td>
+            <td>
+               <p>Flag. With this option, the script is not run and a fully substituted
Pig script produced in the current working directory named original_script_name.substituted</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>script</p>
+            </td>
+            <td>
+               <p>A pig script. The pig script must be the last element in the Pig
command line.</p>
+               <ul>
+                  <li>
+                     <p>If parameters are specified in the Pig command line or in a
parameter file, the script should include a $param_name for each para_name included in the
command line or parameter file.</p>
+                  </li>
+                  <li>
+                     <p>If parameters are specified using the preprocessor statements,
the script should include either %declare or %default.</p>
+                  </li>
+                  <li>
+                     <p>In the script, parameter names can be escaped with the backslash
character ( \ ) in which case substitution does not take place.</p>
+                  </li>
+               </ul>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>%declare</p>
+            </td>
+            <td>
+               <p>Preprocessor statement included in a Pig script.</p>
+               <p>Use to describe one parameter in terms of other parameters.</p>
+               <p>The declare statement is processed prior to running the Pig script.
</p>
+               <p>The scope of a parameter value defined using declare is all the lines
following the declare statement until the next declare statement that defines the same parameter
is encountered.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>%default</p>
+            </td>
+            <td>
+               <p>Preprocessor statement included in a Pig script.</p>
+               <p>Use to provide a default value for a parameter. The default value
has the lowest priority and is used if a parameter value has not been defined by other means.</p>
+               <p>The default statement is processed prior to running the Pig script.
</p>
+               <p>The scope is the same as for %declare.</p>
+            </td>
+         </tr>
+   </table>
+   </section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Parameter substitution enables you to write Pig scripts that include parameters
and to supply values for these parameters at run time. For instance, suppose you have a job
that needs to run every day using the current day's data. You can create a Pig script that
includes a parameter for the date. Then, when you run this script you can specify or supply
a value for the date parameter using one of the supported methods. </p>
+   
+   <section>
+   <title>Specifying Parameters </title>
+   <p>You can specify parameter names and parameter values as follows:</p>
+   <ul>
+      <li>
+         <p>As part of a command line.</p>
+      </li>
+      <li>
+         <p>In parameter file, as part of a command line.</p>
+      </li>
+      <li>
+         <p>With the declare statement, as part of Pig script.</p>
+      </li>
+      <li>
+         <p>With default statement, as part of a Pig script.</p>
+      </li>
+   </ul>
+   </section>
+   
+   <section>
+   <title>Precedence</title>
+   <p>Precedence for parameters is as follows:</p>
+   <ul>
+      <li>
+         <p>Highest - parameters defined using the declare statement</p>
+      </li>
+      <li>
+         <p>Next - parameters defined in the command line</p>
+      </li>
+      <li>
+         <p>Lowest - parameters defined in a script</p>
+      </li>
+   </ul>
+   </section>
+   
+   <section>
+   <title>Processing Order and Precedence</title>
+   <p>Parameters are processed as follows:</p>
+   <ul>
+      <li>
+         <p>Command line parameters are scanned in the order they are specified on
the command line. </p>
+      </li>
+      <li>
+         <p>Parameter files are scanned in the order they are specified on the command
line. Within each file, the parameters are processed in the order they are listed. </p>
+      </li>
+      <li>
+         <p>Declare and default preprocessors statements are processed in the order
they appear in the Pig script. </p>
+      </li>
+   </ul>
+   </section></section>
+   
+   <section>
+   <title>Example: Specifying parameters in the command line</title>
+   <p>Suppose we have a data file called 'mydata' and a pig script called 'myscript.pig'.</p>
+
+<p>mydata </p>
+<source>
+1       2       3
+4       2       1
+8       3       4
+</source>
+ 
+ <p>myscript.pig</p>
+<source>
+A = LOAD '$data' USING PigStorage() AS (f1:int, f2:int, f3:int);
+DUMP A;
+</source>
+
+<p>In this example the parameter (data) and the parameter value (mydata) are specified
in the command line. If the parameter name in the command line (data) and the parameter name
in the script ($data) do not match, the script will not run. If the value for the parameter
(mydata) is not found, an error is generated.</p>
+<source>
+$ pig –param data=mydata myscript.pig
+
+(1,2,3)
+(4,2,1)
+(8,3,4)
+</source>
+   
+   </section>
+   
+   <section>
+   <title>Example: Specifying parameters using a parameter file</title>
+   <p>Suppose we have a parameter file called 'myparams.'</p>
+<source>
+# my parameters
+data1 = mydata1
+cmd = `generate_name`
+</source>
+
+   
+   <p>In this example the parameters and values are passed to the script using the
parameter file.</p>
+<source>
+$ pig –param_file myparams script2.pig
+</source>
+   
+   </section>
+   
+   <section>
+   <title>Example: Specifying parameters using the declare statement</title>
+   <p>In this example the command is executed and its stdout is used as the parameter
value.</p>
+<source>
+%declare CMD 'generate_date';
+A = LOAD '/data/mydata/$CMD';
+B = FILTER A BY $0>'5';
+
+<em>etc ... </em>
+</source>
+   
+   </section>
+   
+   <section>
+   <title>Example: Specifying parameters using the default statement</title>
+   <p>In this example the parameter (DATE) and value ('20090101') are specified in
the Pig script using the default statement. If a value for DATE is not specified elsewhere,
the default value 20090101 is used.</p>
+<source>
+%default DATE '20090101';
+A = load '/data/mydata/$DATE';
+
+<em>etc ... </em>
+</source>
+
+   </section>
+   
+   <section>
+   <title>Examples: Specifying parameter values as a sequence of characters</title>
+   <p>In this example the characters (in this case, Joe's URL) can be enclosed in single
or double quotes, and quotes within the sequence of characters can be escaped. </p>
+<source>
+%declare DES 'Joe\'s URL';
+A = LOAD 'data' AS (name, description, url);
+B = FILTER A BY description == '$DES';
+ 
+<em>etc ... </em>
+</source>
+   
+   <p>In this example single word values that don't use special characters (in this
case, mydata) don't have to be enclosed in quotes.</p>
+<source>
+$ pig –param data=mydata myscript.pig
+</source>   
+</section>
+   
+   <section>
+   <title>Example: Specifying parameter values as a command</title>
+   <p>In this example the command is enclosed in back ticks. First, the parameters
mycmd and date are substituted when the declare statement is encountered. Then the resulting
command is executed and its stdout is placed in the path before the load statement is run.</p>
+<source>
+%declare CMD '$mycmd $date';
+A = LOAD '/data/mydata/$CMD';
+B = FILTER A BY $0>'5';
+ 
+<em>etc ... </em>
+</source>
+   </section>
+   </section>
+   </section>
+
+</body>
+</document>



Mime
View raw message