Return-Path: X-Original-To: apmail-pig-commits-archive@www.apache.org Delivered-To: apmail-pig-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 744BD11C03 for ; Mon, 9 Jun 2014 16:23:59 +0000 (UTC) Received: (qmail 59560 invoked by uid 500); 9 Jun 2014 16:23:59 -0000 Delivered-To: apmail-pig-commits-archive@pig.apache.org Received: (qmail 59522 invoked by uid 500); 9 Jun 2014 16:23:59 -0000 Mailing-List: contact commits-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list commits@pig.apache.org Received: (qmail 59513 invoked by uid 99); 9 Jun 2014 16:23:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jun 2014 16:23:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jun 2014 16:23:56 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 9C08023889B9; Mon, 9 Jun 2014 16:23:30 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1601429 - in /pig/branches/branch-0.13: ./ src/docs/src/documentation/content/xdocs/ Date: Mon, 09 Jun 2014 16:23:30 -0000 To: commits@pig.apache.org From: cheolsoo@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20140609162330.9C08023889B9@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: cheolsoo Date: Mon Jun 9 16:23:29 2014 New Revision: 1601429 URL: http://svn.apache.org/r1601429 Log: PIG-3998: Documentation fix: invalid page links, wrong Groovy udf example (lbendig via cheolsoo) Modified: pig/branches/branch-0.13/CHANGES.txt pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/basic.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cmds.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cont.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/func.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/perf.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/start.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/test.xml pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/udf.xml Modified: pig/branches/branch-0.13/CHANGES.txt URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/CHANGES.txt?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/CHANGES.txt (original) +++ pig/branches/branch-0.13/CHANGES.txt Mon Jun 9 16:23:29 2014 @@ -151,6 +151,8 @@ PIG-3882: Multiquery off mode execution BUG FIXES +PIG-3998: Documentation fix: invalid page links, wrong Groovy udf example (lbendig via cheolsoo) + PIG-4000: Minor documentation fix for PIG-3642 (lbendig via cheolsoo) PIG-3991: TestErrorHandling.tesNegative7 is broken in trunk/branch-0.13 (cheolsoo) Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/basic.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/basic.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/basic.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/basic.xml Mon Jun 9 16:23:29 2014 @@ -311,7 +311,7 @@ A!B
Relations, Bags, Tuples, Fields -

Pig Latin statements work with relations. A relation can be defined as follows:

+

Pig Latin statements work with relations. A relation can be defined as follows:

  • A relation is a bag (more specifically, an outer bag).

    @@ -1633,7 +1633,7 @@ A = load ‘input’ as (x, y, z); B = foreach A generate x+y; -

    If you do DESCRIBE on B, you will see a single column of type double. This is because Pig makes the safest choice and uses the largest numeric type when the schema is not know. In practice, the input data could contain integer values; however, Pig will cast the data to double and make sure that a double result is returned.

    +

    If you do DESCRIBE on B, you will see a single column of type double. This is because Pig makes the safest choice and uses the largest numeric type when the schema is not know. In practice, the input data could contain integer values; however, Pig will cast the data to double and make sure that a double result is returned.

    If the schema of a relation can’t be inferred, Pig will just use the runtime data as is and propagate it through the pipeline.

    @@ -5767,7 +5767,7 @@ ASSERT A by a0 > 0, 'a0 should be greate

    Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.

@@ -5906,7 +5906,7 @@ DUMP X;

Increase the parallelism of a job by specifying the number of reduce tasks, n.

-

For more information, see Use the Parallel Features.

+

For more information, see Use the Parallel Features.

@@ -6058,7 +6058,7 @@ state: chararray,city: chararray,sales:

Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.

  • -

    For more details, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html

    +

    For more details, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/Partitioner.html

  • For usage, see Example: PARTITION BY.

    @@ -6073,7 +6073,7 @@ state: chararray,city: chararray,sales:

    Increase the parallelism of a job by specifying the number of reduce tasks, n.

    -

    For more information, see Use the Parallel Features.

    +

    For more information, see Use the Parallel Features.

    @@ -6700,7 +6700,7 @@ DUMP X;

    Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.

    • -

      For more details, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html

      +

      For more details, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/Partitioner.html

    • For usage, see Example: PARTITION BY

      @@ -6968,7 +6968,7 @@ public class SimpleCustomPartitioner ext

      'replicated'

      -

      Use to perform replicated joins (see Replicated Joins).

      +

      Use to perform replicated joins (see Replicated Joins).

      @@ -6977,7 +6977,7 @@ public class SimpleCustomPartitioner ext

      'skewed'

      -

      Use to perform skewed joins (see Skewed Joins).

      +

      Use to perform skewed joins (see Skewed Joins).

      @@ -6986,7 +6986,7 @@ public class SimpleCustomPartitioner ext

      'merge'

      -

      Use to perform merge joins (see Merge Joins).

      +

      Use to perform merge joins (see Merge Joins).

      @@ -6995,7 +6995,7 @@ public class SimpleCustomPartitioner ext

      'merge-sparse'

      -

      Use to perform merge-sparse joins (see Merge-Sparse Joins).

      +

      Use to perform merge-sparse joins (see Merge-Sparse Joins).

      @@ -7007,8 +7007,7 @@ public class SimpleCustomPartitioner ext

      Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.

      • -

        For more details, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html

        -
      • +

        For more details, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/Partitioner.html

      • For usage, see Example: PARTITION BY

      • @@ -7024,7 +7023,7 @@ public class SimpleCustomPartitioner ext

        Increase the parallelism of a job by specifying the number of reduce tasks, n.

        -

        For more information, see Use the Parallel Features.

        +

        For more information, see Use the Parallel Features.

        @@ -7194,7 +7193,7 @@ DUMP X;

        'replicated'

        -

        Use to perform replicated joins (see Replicated Joins).

        +

        Use to perform replicated joins (see Replicated Joins).

        Only left outer join is supported for replicated joins.

        @@ -7204,7 +7203,7 @@ DUMP X;

        'skewed'

        -

        Use to perform skewed joins (see Skewed Joins).

        +

        Use to perform skewed joins (see Skewed Joins).

        @@ -7213,7 +7212,7 @@ DUMP X;

        'merge'

        -

        Use to perform merge joins (see Merge Joins).

        +

        Use to perform merge joins (see Merge Joins).

        @@ -7226,7 +7225,7 @@ DUMP X;

        Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.

        • -

          For more details, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html

          +

          For more details, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/Partitioner.html

        • For usage, see Example: PARTITION BY

          @@ -7243,7 +7242,7 @@ DUMP X;

          Increase the parallelism of a job by specifying the number of reduce tasks, n.

          -

          For more information, see Use the Parallel Features.

          +

          For more information, see Use the Parallel Features.

          @@ -7463,7 +7462,7 @@ DUMP X;
        • -

          You can use a built in function (see Load/Store Functions). PigStorage is the default load function and does not need to be specified (simply omit the USING clause).

          +

          You can use a built in function (see Load/Store Functions). PigStorage is the default load function and does not need to be specified (simply omit the USING clause).

        • You can write your own load function @@ -8954,7 +8953,7 @@ B = FOREACH A GENERATE myFunc($0); Usage

          Pig Scripts

          -

          Use the REGISTER statement inside a Pig script to specify a JAR file or a Python/JavaScript module. Pig supports JAR files and modules stored in local file systems as well as remote, distributed file systems such as HDFS and Amazon S3 (see Pig Scripts).

          +

          Use the REGISTER statement inside a Pig script to specify a JAR file or a Python/JavaScript module. Pig supports JAR files and modules stored in local file systems as well as remote, distributed file systems such as HDFS and Amazon S3 (see Pig Scripts).

          Additionally, JAR files stored in local file systems can be specified as a glob pattern using “*”. Pig will search for matching jars in the local file system, either the relative path (relative to your working directory) or the absolute path. Pig will pick up all JARs that match the glob.

          Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cmds.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cmds.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cmds.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cmds.xml Mon Jun 9 16:23:29 2014 @@ -72,7 +72,7 @@ The fs command greatly extends the set of supported file system commands and the capabilities supported for existing commands such as ls that will now support globing. For a complete list of FsShell commands, see - File System Shell Guide

          + File System Shell Guide

          @@ -652,7 +652,7 @@ grunt> run –param out=myoutput m

          Sets the number of reducers for all MapReduce jobs generated by Pig - (see Use the Parallel Features).

          + (see Use the Parallel Features).

          @@ -698,7 +698,7 @@ grunt> run –param out=myoutput m

          String that contains the path.

          -

          For streaming, sets the path from which not to ship data (see DEFINE (UDFs, streaming) and About Auto-Ship).

          +

          For streaming, sets the path from which not to ship data (see DEFINE (UDFs, streaming) and About Auto-Ship).

          Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cont.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cont.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cont.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/cont.xml Mon Jun 9 16:23:29 2014 @@ -28,7 +28,7 @@

          To enable control flow, you can embed Pig Latin statements and Pig commands in the Python, JavaScript and Groovy scripting languages using a JDBC-like compile, bind, run model. For Python, make sure the Jython jar is included in your class path. For JavaScript, make sure the Rhino jar is included in your classpath. For Groovy, make sure the groovy-all jar is included in your classpath.

          -

          Note that host languages and the languages of UDFs (included as part of the embedded Pig) are completely orthogonal. For example, a Pig Latin statement that registers a Python UDF may be embedded in Python, JavaScript, or Java. The exception to this rule is "combined" scripts – here the languages must match (see the Advanced Topics for Python, Advanced Topics for JavaScript and Advanced Topics for Groovy).

          +

          Note that host languages and the languages of UDFs (included as part of the embedded Pig) are completely orthogonal. For example, a Pig Latin statement that registers a Python UDF may be embedded in Python, JavaScript, or Java. The exception to this rule is "combined" scripts – here the languages must match (see the Advanced Topics for Python, Advanced Topics for JavaScript and Advanced Topics for Groovy).

          @@ -818,11 +818,11 @@ public interface PigProgressNotification

          To enable control flow, you can embed Pig Latin statements and Pig commands in the Java programming language.

          -

          Note that host languages and the languages of UDFs (included as part of the embedded Pig) are completely orthogonal. For example, a Pig Latin statement that registers a Java UDF may be embedded in Python, JavaScript, Groovy, or Java. The exception to this rule is "combined" scripts – here the languages must match (see the Advanced Topics for Python, Advanced Topics for JavaScript and Advanced Topics for Groovy).

          +

          Note that host languages and the languages of UDFs (included as part of the embedded Pig) are completely orthogonal. For example, a Pig Latin statement that registers a Java UDF may be embedded in Python, JavaScript, Groovy, or Java. The exception to this rule is "combined" scripts – here the languages must match (see the Advanced Topics for Python, Advanced Topics for JavaScript and Advanced Topics for Groovy).

          PigServer Interface -

          Currently, PigServer is the main interface for embedding Pig in Java. PigServer can now be instantiated from multiple threads. (In the past, PigServer contained references to static data that prevented multiple instances of the object to be created from different threads within your application.) Please note that PigServer is NOT thread safe; the same object can't be shared across multiple threads.

          +

          Currently, PigServer is the main interface for embedding Pig in Java. PigServer can now be instantiated from multiple threads. (In the past, PigServer contained references to static data that prevented multiple instances of the object to be created from different threads within your application.) Please note that PigServer is NOT thread safe; the same object can't be shared across multiple threads.

          Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/func.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/func.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/func.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/func.xml Mon Jun 9 16:23:29 2014 @@ -102,7 +102,7 @@ decoded_strings = FOREACH encoded_string
          Example -

          In this example the average GPA for each student is computed (see the GROUP operator for information about the field names in relation B).

          +

          In this example the average GPA for each student is computed (see the GROUP operator for information about the field names in relation B).

          A = LOAD 'student.txt' AS (name:chararray, term:chararray, gpa:float); @@ -401,7 +401,7 @@ DUMP X;
          Example -

          In this example the tuples in the bag are counted (see the GROUP operator for information about the field names in relation B).

          +

          In this example the tuples in the bag are counted (see the GROUP operator for information about the field names in relation B).

          A = LOAD 'data' AS (f1:int,f2:int,f3:int); Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/perf.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/perf.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/perf.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/perf.xml Mon Jun 9 16:23:29 2014 @@ -61,7 +61,7 @@ explain C;

        -

        You can check if the combiner is used for your query by running EXPLAIN on the FOREACH alias as shown above. You should see the combine section in the MapReduce part of the plan:

        +

        You can check if the combiner is used for your query by running EXPLAIN on the FOREACH alias as shown above. You should see the combine section in the MapReduce part of the plan:

        @@ -89,7 +89,7 @@ B: Local Rearrange[tuple]{bytearray}(fal

        The combiner is also used with a nested foreach as long as the only nested operation used is DISTINCT -(see FOREACH and Example: Nested Block). +(see FOREACH and Example: Nested Block).

        @@ -226,7 +226,7 @@ $ pig -no_multiquery myscript.pig
      • For batch mode execution, the entire script is first parsed to determine if intermediate tasks can be combined to reduce the overall amount of work that needs to be done; execution starts only after the parsing is completed -(see the EXPLAIN operator and the run and exec commands).

        +(see the EXPLAIN operator and the run and exec commands).

      • @@ -294,8 +294,8 @@ With multi-query execution, the script w
        Store vs. Dump -

        With multi-query exection, you want to use STORE to save (persist) your results. - You do not want to use DUMP as it will disable multi-query execution and is likely to slow down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)

        +

        With multi-query exection, you want to use STORE to save (persist) your results. + You do not want to use DUMP as it will disable multi-query execution and is likely to slow down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)

        DUMP Example: In this script, because the DUMP command is interactive, the multi-query execution will be disabled and two separate jobs will be created to execute this script. The first job will execute A > B > DUMP while the second job will execute A > B > C > STORE.

        @@ -699,7 +699,7 @@ B = GROUP A all PARALLEL 10;
        Use Optimization -

        Pig supports various optimization rules which are turned on by default. +

        Pig supports various optimization rules which are turned on by default. Become familiar with these rules.

        @@ -833,7 +833,7 @@ C = foreach B generate group, MyUDF(A);
        Use the Accumulator Interface

        -If your UDF can't be made Algebraic but is able to deal with getting input in chunks rather than all at once, consider implementing the Accumulator interface to reduce the amount of memory used by your script. If your function is Algebraic and can be used on conjunction with Accumulator functions, you will need to implement the Accumulator interface as well as the Algebraic interface. For more information, see Accumulator Interface.

        +If your UDF can't be made Algebraic but is able to deal with getting input in chunks rather than all at once, consider implementing the Accumulator interface to reduce the amount of memory used by your script. If your function is Algebraic and can be used on conjunction with Accumulator functions, you will need to implement the Accumulator interface as well as the Algebraic interface. For more information, see Accumulator Interface.

        Note: Pig automatically chooses the interface that it expects to provide the best performance: Algebraic > Accumulator > Default.

        @@ -889,7 +889,7 @@ C = join small by t, large by x;

        Specialized Join Optimizations

        Optimization can also be achieved using fragment replicate joins, skewed joins, and merge joins. -For more information see Specialized Joins.

        +For more information see Specialized Joins.

        @@ -906,13 +906,13 @@ For more information see COGROUP, -CROSS, -DISTINCT, -GROUP, -JOIN (inner), -JOIN (outer), and -ORDER BY. +COGROUP, +CROSS, +DISTINCT, +GROUP, +JOIN (inner), +JOIN (outer), and +ORDER BY.

        The number of reducers you need for a particular construct in Pig that forms a MapReduce boundary depends entirely on (1) your data and the number of intermediate keys you are generating in your mappers and (2) the partitioner and distribution of map (combiner) output keys. In the best cases we have seen that a reducer processing about 1 GB of data behaves efficiently.

        @@ -1038,7 +1038,7 @@ java -cp $PIG_HOME/pig.jar

      -

      This feature works with PigStorage. However, if you are using a custom loader, please note the following:

      +

      This feature works with PigStorage. However, if you are using a custom loader, please note the following:

      • If your loader implementation makes use of the PigSplit object passed through the prepareToRead method, then you may need to rebuild the loader since the definition of PigSplit has been modified.
      • @@ -1130,7 +1130,7 @@ don't, the process fails and an error is
        Usage -

        Perform a replicated join with the USING clause (see JOIN (inner) and JOIN (outer)). +

        Perform a replicated join with the USING clause (see JOIN (inner) and JOIN (outer)). In this example, a large relation is joined with two smaller relations. Note that the large relation comes first followed by the smaller relations; and, all small relations together must fit into main memory, otherwise an error is generated.

        @@ -1176,7 +1176,7 @@ associated with a given key is too large
        Usage -

        Perform a skewed join with the USING clause (see JOIN (inner) and JOIN (outer)).

        +

        Perform a skewed join with the USING clause (see JOIN (inner) and JOIN (outer)).

        A = LOAD 'skewed_data' AS (a1,a2,a3); B = LOAD 'data' AS (b1,b2,b3); @@ -1233,7 +1233,7 @@ and the right input of the join to be th
        Usage -

        Perform a merge join with the USING clause (see JOIN (inner) and JOIN (outer)).

        +

        Perform a merge join with the USING clause (see JOIN (inner) and JOIN (outer)).

        C = JOIN A BY a1, B BY b1, C BY c1 USING 'merge'; @@ -1286,7 +1286,7 @@ C = JOIN A BY a1, B BY b1, C BY c1 USING
        Usage -

        Perform a merge-sparse join with the USING clause (see JOIN (inner)).

        +

        Perform a merge-sparse join with the USING clause (see JOIN (inner)).

        a = load 'sorted_input1' using org.apache.pig.piggybank.storage.IndexedStorage('\t', '0'); b = load 'sorted_input2' using org.apache.pig.piggybank.storage.IndexedStorage('\t', '0'); Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/start.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/start.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/start.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/start.xml Mon Jun 9 16:23:29 2014 @@ -270,7 +270,7 @@ DUMP B; -- retrieving results

        Scripts and Distributed File Systems

        -

        Pig supports running scripts (and Jar files) that are stored in HDFS, Amazon S3, and other distributed file systems. The script's full location URI is required (see REGISTER for information about Jar files). For example, to run a Pig script on HDFS, do the following:

        +

        Pig supports running scripts (and Jar files) that are stored in HDFS, Amazon S3, and other distributed file systems. The script's full location URI is required (see REGISTER for information about Jar files). For example, to run a Pig script on HDFS, do the following:

        $ pig hdfs://nn.mydomain.com:9020/myscripts/script.pig @@ -286,7 +286,7 @@ $ pig hdfs://nn.mydomain.com:9020/myscri

        Pig Latin statements are the basic constructs you use to process data using Pig. A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. (This definition applies to all Pig Latin operators except LOAD and STORE which read data from and write data to the file system.) - Pig Latin statements may include expressions and schemas. + Pig Latin statements may include expressions and schemas. Pig Latin statements can span multiple lines and must end with a semi-colon ( ; ). By default, Pig Latin statements are processed using multi-query execution.

        @@ -330,7 +330,7 @@ DUMP B;
        Loading Data -

        Use the LOAD operator and the load/store functions to read data into Pig (PigStorage is the default load function).

        +

        Use the LOAD operator and the load/store functions to read data into Pig (PigStorage is the default load function).

        @@ -339,19 +339,19 @@ DUMP B;

        Pig allows you to transform data in many ways. As a starting point, become familiar with these operators:

        • -

          Use the FILTER operator to work with tuples or rows of data. - Use the FOREACH operator to work with columns of data.

          +

          Use the FILTER operator to work with tuples or rows of data. + Use the FOREACH operator to work with columns of data.

        • -

          Use the GROUP operator to group data in a single relation. - Use the COGROUP, +

          Use the GROUP operator to group data in a single relation. + Use the COGROUP, inner JOIN, and outer JOIN operators to group or join data in two or more relations.

        • -

          Use the UNION operator to merge the contents of two or more relations. - Use the SPLIT operator to partition the contents of a relation into multiple relations.

          +

          Use the UNION operator to merge the contents of two or more relations. + Use the SPLIT operator to partition the contents of a relation into multiple relations.

        @@ -368,10 +368,10 @@ DUMP B;
        Storing Final Results -

        Use the STORE operator and the load/store functions +

        Use the STORE operator and the load/store functions to write results to the file system (PigStorage is the default store function).

        Note: During the testing/debugging phase of your implementation, you can use DUMP to display results to your terminal screen. -However, in a production environment you always want to use the STORE operator to save your results (see Store vs. Dump).

        +However, in a production environment you always want to use the STORE operator to save your results (see Store vs. Dump).

        Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/test.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/test.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/test.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/test.xml Mon Jun 9 16:23:29 2014 @@ -159,7 +159,7 @@ D: {age: bytearray}

        Note that production scripts SHOULD NOT use DUMP as it will disable multi-query optimizations and is likely to slow down execution - (see Store vs. Dump). + (see Store vs. Dump).

        @@ -846,7 +846,7 @@ $pig_trunk ant pigunit-jar

        The example included here computes the top N of the most common queries. The Pig script, top_queries.pig, is similar to the - Query Phrase Popularity + Query Phrase Popularity in the Pig tutorial. It expects an input a file of queries and a parameter n (n is 2 in our case in order to do a top 2).

        Modified: pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/udf.xml URL: http://svn.apache.org/viewvc/pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/udf.xml?rev=1601429&r1=1601428&r2=1601429&view=diff ============================================================================== --- pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/udf.xml (original) +++ pig/branches/branch-0.13/src/docs/src/documentation/content/xdocs/udf.xml Mon Jun 9 16:23:29 2014 @@ -1817,9 +1817,9 @@ outputSchema "t:(m:[], t:(name:chararray

        @OutputSchemaFunction annotation - Defines the name of a function which will return the schema at runtime according to the input schema.

        -import org.apache.pig.scripting.groovy.OutputSchemaFunction;", +import org.apache.pig.scripting.groovy.OutputSchemaFunction; -class GroovyUDFs {", +class GroovyUDFs { @OutputSchemaFunction('squareSchema') public static square(x) { return x * x;