Return-Path: Delivered-To: apmail-hadoop-pig-commits-archive@www.apache.org Received: (qmail 12035 invoked from network); 5 Jan 2010 17:19:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2010 17:19:16 -0000 Received: (qmail 42197 invoked by uid 500); 5 Jan 2010 17:19:16 -0000 Delivered-To: apmail-hadoop-pig-commits-archive@hadoop.apache.org Received: (qmail 42152 invoked by uid 500); 5 Jan 2010 17:19:16 -0000 Mailing-List: contact pig-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-commits@hadoop.apache.org Received: (qmail 42143 invoked by uid 500); 5 Jan 2010 17:19:16 -0000 Delivered-To: apmail-incubator-pig-commits@incubator.apache.org Received: (qmail 42140 invoked by uid 99); 5 Jan 2010 17:19:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jan 2010 17:19:16 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jan 2010 17:19:13 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 07F9D23889CB; Tue, 5 Jan 2010 17:18:52 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r896134 - in /hadoop/pig/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/piglatin_reference.xml src/docs/src/documentation/content/xdocs/piglatin_users.xml Date: Tue, 05 Jan 2010 17:18:51 -0000 To: pig-commits@incubator.apache.org From: olga@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20100105171852.07F9D23889CB@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: olga Date: Tue Jan 5 17:18:51 2010 New Revision: 896134 URL: http://svn.apache.org/viewvc?rev=896134&view=rev Log: PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) Modified: hadoop/pig/trunk/CHANGES.txt hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml Modified: hadoop/pig/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=896134&r1=896133&r2=896134&view=diff ============================================================================== --- hadoop/pig/trunk/CHANGES.txt (original) +++ hadoop/pig/trunk/CHANGES.txt Tue Jan 5 17:18:51 2010 @@ -24,6 +24,8 @@ IMPROVEMENTS +PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) + PIG-1102: Collect number of spills per job (sriranjan via olgan) PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml?rev=896134&r1=896133&r2=896134&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml Tue Jan 5 17:18:51 2010 @@ -4919,58 +4919,7 @@ -
- DUMP - Displays the contents of a relation. - -
- Syntax - - - - DUMP alias;        - - -
- -
- Terms - - - - alias - - - The name of a relation. - - -
- -
- Usage - Use the DUMP operator to run (execute) a Pig Latin statement and to display the contents of an alias. You can use DUMP as a debugging device to make sure the results you are expecting are being generated.
- -
- Example - In this example a dump is performed after each statement. - -A = LOAD 'student' AS (name:chararray, age:int, gpa:float); - -DUMP A; -(John,18,4.0F) -(Mary,19,3.7F) -(Bill,20,3.9F) -(Joe,22,3.8F) -(Jill,20,4.0F) - -B = FILTER A BY name matches 'J.+'; - -DUMP B; -(John,18,4.0F) -(Joe,22,3.8F) -(Jill,20,4.0F) - -
+
FILTER @@ -6521,7 +6470,7 @@
STORE - Stores data to the file system. + Stores or saves results to the file system.
Syntax @@ -6591,7 +6540,10 @@
Usage - Use the STORE operator to run (execute) Pig Latin statements and to store data on the file system.
+ Use the STORE operator to run (execute) Pig Latin statements and save (persist) results to the file system. Use STORE for production scripts and batch mode processing. + + Note: To debug scripts during development, you can use DUMP to check intermediate results. +
Examples @@ -6962,6 +6914,68 @@
+ +
+ DUMP + Dumps or displays results to screen. + +
+ Syntax + + + + DUMP alias;        + + +
+ +
+ Terms + + + + alias + + + The name of a relation. + + +
+ +
+ Usage + Use the DUMP operator to run (execute) Pig Latin statements and display the results to your screen. DUMP is meant for interactive mode; statements are executed immediately and the results are not saved (persisted). You can use DUMP as a debugging device to make sure that the results you are expecting are actually generated. + + + Note that production scripts should not use DUMP as it will disable multi-query optimizations and is likely to slow down execution + (see Store vs. Dump). + +
+ +
+ Example + In this example a dump is performed after each statement. + +A = LOAD 'student' AS (name:chararray, age:int, gpa:float); + +DUMP A; +(John,18,4.0F) +(Mary,19,3.7F) +(Bill,20,3.9F) +(Joe,22,3.8F) +(Jill,20,4.0F) + +B = FILTER A BY name matches 'J.+'; + +DUMP B; +(John,18,4.0F) +(Joe,22,3.8F) +(Jill,20,4.0F) + +
+ + +
EXPLAIN Displays execution plans. Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml?rev=896134&r1=896133&r2=896134&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml Tue Jan 5 17:18:51 2010 @@ -54,7 +54,7 @@
Running Pig Latin -

You can execute Pig Latin statements interactively or in batch mode using Pig scripts (see the EXEC and RUN operators).

+

You can execute Pig Latin statements interactively or in batch mode using Pig scripts (see the exec and run commands).

Grunt Shell, Interactive or Batch Mode

@@ -228,15 +228,12 @@
Multi-Query Execution -

With multi-query execution Pig processes an entire script or a batch of statements at once -(as opposed to processing statements when a DUMP or STORE is encountered).

- - +

With multi-query execution Pig processes an entire script or a batch of statements at once.

Turning Multi-Query Execution On or Off

Multi-query execution is turned on by default. - To turn it off and revert to Pi'gs "execute-on-dump/store" behavior, use the "-M" or "-no_multiquery" options.

+ To turn it off and revert to Pig's "execute-on-dump/store" behavior, use the "-M" or "-no_multiquery" options.

To run script "myscript.pig" without the optimization, execute Pig as follows:

$ pig -M myscript.pig @@ -253,7 +250,8 @@
  • For batch mode execution, the entire script is first parsed to determine if intermediate tasks can be combined to reduce the overall amount of work that needs to be done; execution starts only after the parsing is completed -(see the EXPLAIN operator and the EXEC and RUN commands).

    +(see the EXPLAIN operator and the exec and run commands).

    +
  • Two run scenarios are optimized, as explained below: explicit and implicit splits, and storing intermediate results.

    @@ -316,7 +314,32 @@
  • +
    + Store vs. Dump +

    With multi-query exection, you want to use STORE to save (persist) your results. + You do not want to use DUMP as it will disable multi-query execution and is likely to slow down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)

    + +

    DUMP Example: In this script, because the DUMP command is interactive, the multi-query execution will be disabled and two separate jobs will be created to execute this script. The first job will execute A > B > DUMP while the second job will execute A > B > C > STORE.

    + + +A = LOAD ‘input’ AS (x, y, z); +B = FILTER A BY x > 5; +DUMP B; +C = FOREACH B GENERATE y, z; +STORE C INTO ‘output’; + + +

    STORE Example: In this script, multi-query optimization will kick in allowing the entire script to be executed as a single job. Two outputs are produced: output1 and output2.

    + + +A = LOAD ‘input’ AS (x, y, z); +B = FILTER A BY x > 5; +STORE B INTO ‘output1’; +C = FOREACH B GENERATE y, z; +STORE C INTO ‘output2’; + +
    Error Handling

    With multi-query execution Pig processes an entire script or a batch of statements at once. @@ -352,10 +375,10 @@ Backward Compatibility

    Most existing Pig scripts will produce the same result with or without the multi-query execution. - There are cases though were this is not true. Path names and schemes are discussed here.

    + There are cases though where this is not true. Path names and schemes are discussed here.

    Any script is parsed in it's entirety before it is sent to execution. Since the current directory can change - throughout the script any path used in load or store is translated to a fully qualified and absolute path.

    + throughout the script any path used in LOAD or STORE statement is translated to a fully qualified and absolute path.

    In map-reduce mode, the following script will load from "hdfs://<host>:<port>/data1" and store into "hdfs://<host>:<port>/tmp/out1".

    @@ -375,7 +398,7 @@
  • Specify a custom scheme for the LoadFunc/Slicer

  • -

    Arguments used in a load statement that have a scheme other than "hdfs" or "file" will not be expanded and passed to the LoadFunc/Slicer unchanged.

    +

    Arguments used in a LOAD statement that have a scheme other than "hdfs" or "file" will not be expanded and passed to the LoadFunc/Slicer unchanged.

    In the SQL case, the SQLLoader function is invoked with "sql://mytable".

    @@ -416,7 +439,7 @@
    Example -

    In this script, the store/load operators have different file paths; however, the load operator depends on the store operator.

    +

    In this script, the STORE/LOAD operators have different file paths; however, the LOAD operator depends on the STORE operator.

    A = LOAD '/user/xxx/firstinput' USING PigStorage(); B = group ....