From commits-return-3664-apmail-pig-commits-archive=pig.apache.org@pig.apache.org Thu Dec 16 22:50:10 2010 Return-Path: Delivered-To: apmail-pig-commits-archive@www.apache.org Received: (qmail 67708 invoked from network); 16 Dec 2010 22:50:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Dec 2010 22:50:06 -0000 Received: (qmail 46575 invoked by uid 500); 16 Dec 2010 22:50:06 -0000 Delivered-To: apmail-pig-commits-archive@pig.apache.org Received: (qmail 46556 invoked by uid 500); 16 Dec 2010 22:50:06 -0000 Mailing-List: contact commits-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list commits@pig.apache.org Received: (qmail 46549 invoked by uid 99); 16 Dec 2010 22:50:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Dec 2010 22:50:05 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Dec 2010 22:50:03 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 83D212388A64; Thu, 16 Dec 2010 22:49:43 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1050207 - in /pig/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/basic.xml src/docs/src/documentation/content/xdocs/test.xml Date: Thu, 16 Dec 2010 22:49:43 -0000 To: commits@pig.apache.org From: olga@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20101216224943.83D212388A64@eris.apache.org> Author: olga Date: Thu Dec 16 22:49:43 2010 New Revision: 1050207 URL: http://svn.apache.org/viewvc?rev=1050207&view=rev Log: PIG-1768: 09 docs: illustrate (changec via olgan) Modified: pig/trunk/CHANGES.txt pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml pig/trunk/src/docs/src/documentation/content/xdocs/test.xml Modified: pig/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?rev=1050207&r1=1050206&r2=1050207&view=diff ============================================================================== --- pig/trunk/CHANGES.txt (original) +++ pig/trunk/CHANGES.txt Thu Dec 16 22:49:43 2010 @@ -24,6 +24,8 @@ INCOMPATIBLE CHANGES IMPROVEMENTS +PIG-1768: 09 docs: illustrate (changec via olgan) + PIG-1768: docs reorg (changec via olgan) PIG-1712: ILLUSTRATE rework (yanz) Modified: pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?rev=1050207&r1=1050206&r2=1050207&view=diff ============================================================================== --- pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml (original) +++ pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml Thu Dec 16 22:49:43 2010 @@ -284,12 +284,36 @@ grunt> C = FOREACH B GENERATE COUNT ($0) grunt> DUMP C; - - + +
Data Types and More + +
+Identifiers +

Identifiers include the names of relations (aliases), fields, variables, and so on. +In Pig, identifiers start with a letter and can be followed by any number of letters, digits, or underscores.

+ +

Valid identifiers:

+ +A +A123 +abc_123_BeX_ + +

+

Invalid Identifies:

+ +_abc +abc_$ +A!B + + + +
+ +
Relations, Bags, Tuples, Fields @@ -1830,7 +1854,7 @@ DUMP A; ([open#apache]) ([apache#hadoop]) -
+
Schemas for Multiple Types Modified: pig/trunk/src/docs/src/documentation/content/xdocs/test.xml URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/test.xml?rev=1050207&r1=1050206&r2=1050207&view=diff ============================================================================== --- pig/trunk/src/docs/src/documentation/content/xdocs/test.xml (original) +++ pig/trunk/src/docs/src/documentation/content/xdocs/test.xml Thu Dec 16 22:49:43 2010 @@ -341,7 +341,7 @@ Local Rearrange[tuple]{chararray}(false) -
+
ILLUSTRATE

Displays a step-by-step execution of a sequence of statements.

@@ -372,7 +372,7 @@ Local Rearrange[tuple]{chararray}(false)

-script scriptfile

-

The script keyword followed by the name of a Pig script file (for example, myscript.pig).

+

The script keyword followed by the name of a Pig script (for example, myscript.pig).

The script file should not contain an ILLUSTRATE statement.

@@ -380,92 +380,128 @@ Local Rearrange[tuple]{chararray}(false)
Usage -

Use the ILLUSTRATE operator to review how data is transformed through a sequence of Pig Latin statements. - You can run ILLUSTRATE with a relation or a Pig script.

+

Use the ILLUSTRATE operator to review how data is transformed through a sequence of Pig Latin statements. + ILLUSTRATE allows you to test your programs on small datasets and get faster turnaround times.

+

ILLUSTRATE is based on an example generator +(see Generating Example Data for Dataflow Programs). -

ILLUSTRATE accesses the ExampleGenerator algorithm which can select an appropriate and concise set of example data automatically. It does a better job than random sampling would do; for example, random sampling suffers from the drawback that selective operations such as filters or joins can eliminate all the sampled data, giving you empty results which will not help with debugging.

+The algorithm works by retrieving a small sample of the input data and then propagating this data through the pipeline. However, some operators, such as JOIN or FILTER, can eliminate tuples from the data - and this could result in no data following through the pipeline. To address this issue, the algorithm will automatically generate example data, in near real-time. Thus, you might see data propagating through the pipeline that was not found in the original input data, but this data changes nothing and ensures that you will be able to examine the semantics of your Pig Latin statements.

-

With the ILLUSTRATE operator you can test your programs on small datasets and get faster turnaround times. The ExampleGenerator algorithm uses Pig's local mode (rather than Pig's mapreduce mode) which means that illustrative example data is generated in near real-time.

- -
+

As shown in the examples below, you can use ILLUSTRATE to review a relation or an entire Pig script.

+
Example - Relation

This example demonstrates how to use ILLUSTRATE with a relation. Note that the LOAD statement must include a schema (the AS clause).

- -visits = LOAD 'visits.txt' AS (user:chararray, url:chararray, timestamp:chararray); - -DUMP visits; -(Amy,cnn.com,20080218) -(Fred,harvard.edu,20081204) -(Amy,bbc.com,20081205) -(Fred,stanford.edu,20081206) - -recent_visits = FILTER visits BY timestamp >= '20081201'; +grunt> visits = LOAD 'visits.txt' AS (user:chararray, url:chararray, timestamp:chararray); +grunt> DUMP visits; -user_visits = GROUP recent_visits BY user; +(Amy,yahoo.com,19990421) +(Fred,harvard.edu,19991104) +(Amy,cnn.com,20070218) +(Frank,nba.com,20070305) +(Fred,berkeley.edu,20071204) +(Fred,stanford.edu,20071206) + +grunt> recent_visits = FILTER visits BY timestamp >= '20071201'; +grunt> user_visits = GROUP recent_visits BY user; +grunt> num_user_visits = FOREACH user_visits GENERATE group, COUNT(recent_visits); +grunt> DUMP num_user_visits; -num_user_visits = FOREACH user_visits GENERATE group, COUNT(recent_visits); +(Fred,2) -DUMP num_user_visits; -(1L) -(2L) - -ILLUSTRATE num_user_visits; ------------------------------------------------------------------------- -| visits | user: bytearray | ulr: bytearray | timestamp: bytearray | ------------------------------------------------------------------------- -| | Amy | cnn.com | 20080218 | -| | Fred | harvard.edu | 20081204 | -| | Amy | bbc.com | 20081205 | -| | Fred | stanford.edu | 20081206 | +grunt> ILLUSTRATE num_user_visits; ------------------------------------------------------------------------ - ------------------------------------------------------------------------- -| visits | user: chararray | ulr: chararray | timestamp: chararray | +| visits | user: chararray | url: chararray | timestamp: chararray | ------------------------------------------------------------------------ -| | Amy | cnn.com | 20080218 | -| | Fred | harvard.edu | 20081204 | -| | Amy | bbc.com | 20081205 | -| | Fred | stanford.edu | 20081206 | +| | Fred | berkeley.edu | 20071204 | +| | Fred | stanford.edu | 20071206 | +| | Frank | nba.com | 20070305 | ------------------------------------------------------------------------ - ------------------------------------------------------------------------------- -| recent_visits | user: chararray | ulr: chararray | timestamp: chararray | +| recent_visits | user: chararray | url: chararray | timestamp: chararray | ------------------------------------------------------------------------------- -| | Fred | harvard.edu | 20081204 | -| | Amy | bbc.com | 20081205 | -| | Fred | stanford.edu | 20081206 | +| | Fred | berkeley.edu | 20071204 | +| | Fred | stanford.edu | 20071206 | ------------------------------------------------------------------------------- - ------------------------------------------------------------------------------------------------------------------ -| user_visits | group: chararray | recent_visits: bag({user: chararray,ulr: chararray,timestamp: chararray}) | +| user_visits | group: chararray | recent_visits: bag({user: chararray,url: chararray,timestamp: chararray}) | ------------------------------------------------------------------------------------------------------------------ -| | Amy | {(Amy, bbc.com, 20081205)} | -| | Fred | {(Fred, harvard.edu, 20081204), (Fred, stanford.edu, 20081206)} | +| | Fred | {(Fred, berkeley.edu, 20071204), (Fred, stanford.edu, 20071206)} | ------------------------------------------------------------------------------------------------------------------ - -------------------------------- -| num_user_visits | long | ------------------------------- -| | 1 | -| | 2 | -------------------------------- +-------------------------------------------------- +| num_user_visits | group: chararray | long | +-------------------------------------------------- +| | Fred | 2 | +--------------------------------------------------
Example - Script -

This example demonstrates how to use ILLUSTRATE with a script. Note that the script itself should not contain an ILLUSTRATE statement.

+

This example demonstrates how to use ILLUSTRATE with a Pig script. Note that the script itself should not contain an ILLUSTRATE statement.

+grunt> cat visits.txt +Amy yahoo.com 19990421 +Fred harvard.edu 19991104 +Amy cnn.com 20070218 +Frank nba.com 20070305 +Fred berkeley.edu 20071204 +Fred stanford.edu 20071206 + +grunt> cat visits.pig +visits = LOAD 'visits.txt' AS (user, url, timestamp); +recent_visits = FILTER visits BY timestamp >= '20071201'; +historical_visits = FILTER visits BY timestamp <= '20000101'; +DUMP recent_visits; +DUMP historical_visits; +STORE recent_visits INTO 'recent'; +STORE historical_visits INTO 'historical'; + +grunt> exec visits.pig + +(Fred,berkeley.edu,20071204) +(Fred,stanford.edu,20071206) +(Amy,yahoo.com,19990421) +(Fred,harvard.edu,19991104) + +grunt> illustrate -script visits.pig + +------------------------------------------------------------------------ +| visits | user: bytearray | url: bytearray | timestamp: bytearray | +------------------------------------------------------------------------ +| | Amy | yahoo.com | 19990421 | +| | Fred | stanford.edu | 20071206 | +------------------------------------------------------------------------ +------------------------------------------------------------------------------- +| recent_visits | user: bytearray | url: bytearray | timestamp: bytearray | +------------------------------------------------------------------------------- +| | Fred | stanford.edu | 20071206 | +------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------- +| Store : recent_visits | user: bytearray | url: bytearray | timestamp: bytearray | +--------------------------------------------------------------------------------------- +| | Fred | stanford.edu | 20071206 | +--------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------- +| historical_visits | user: bytearray | url: bytearray | timestamp: bytearray | +----------------------------------------------------------------------------------- +| | Amy | yahoo.com | 19990421 | +----------------------------------------------------------------------------------- +------------------------------------------------------------------------------------------- +| Store : historical_visits | user: bytearray | url: bytearray | timestamp: bytearray | +------------------------------------------------------------------------------------------- +| | Amy | yahoo.com | 19990421 | +-------------------------------------------------------------------------------------------
+