pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From o...@apache.org
Subject svn commit: r1043169 - in /pig/branches/branch-0.8: ./ src/docs/src/documentation/content/xdocs/
Date Tue, 07 Dec 2010 19:23:09 GMT
Author: olga
Date: Tue Dec  7 19:23:09 2010
New Revision: 1043169

URL: http://svn.apache.org/viewvc?rev=1043169&view=rev
Log:
PIG-1756: doc updates (chandec via olgan)

Modified:
    pig/branches/branch-0.8/CHANGES.txt
    pig/branches/branch-0.8/RELEASE_NOTES.txt
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref1.xml
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/pigunit.xml
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/setup.xml

Modified: pig/branches/branch-0.8/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/CHANGES.txt?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/CHANGES.txt (original)
+++ pig/branches/branch-0.8/CHANGES.txt Tue Dec  7 19:23:09 2010
@@ -26,6 +26,8 @@ PIG-1249: Safe-guards against misconfigu
 
 IMPROVEMENTS
 
+PIG-1756: doc updates (chandec via olgan)
+
 PIG-1707: Allow pig build to pull from alternate maven repo to enable building
 against newer hadoop versions (pradeepkth)
 

Modified: pig/branches/branch-0.8/RELEASE_NOTES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/RELEASE_NOTES.txt?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/RELEASE_NOTES.txt (original)
+++ pig/branches/branch-0.8/RELEASE_NOTES.txt Tue Dec  7 19:23:09 2010
@@ -1,4 +1,4 @@
-These notes are for Pig 0.3.0 release.
+These notes are for Pig 0.8.0 release.
 
 Highlights
 ==========

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref1.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref1.xml?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref1.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref1.xml Tue
Dec  7 19:23:09 2010
@@ -1128,7 +1128,10 @@ copying cost. We have seen good performa
 in the range 0.1 - 0.4. However, note that this is hardly an accurate range. Its value 
 depends on the amount of heap available for the operation, the number of columns 
 in the input and the skew. An appropriate value is best obtained by conducting experiments
to achieve 
-a good performance. The default value is =0.5=. </li>
+a good performance. The default value is 0.5. </li>
+<li>Skewed join does not address (balance) uneven data distribution across reducers.

+However, in most cases, skewed join ensures that the join will finish (however slowly) rather
than fail. 
+</li>
 </ul>
 </section>
 </section><!-- END SKEWED JOINS-->

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml Tue
Dec  7 19:23:09 2010
@@ -5020,7 +5020,8 @@ DUMP X;
    
    <section>
    <title>Usage</title>
-   <para>Use the DISTINCT operator to remove duplicate tuples in a relation. DISTINCT
does not preserve the original order of the contents (to eliminate duplicates, Pig must first
sort the data). You cannot use DISTINCT on a subset of fields. To do this, use FOREACH…GENERATE
to select the fields, and then use DISTINCT.</para></section>
+   <para>Use the DISTINCT operator to remove duplicate tuples in a relation. DISTINCT
does not preserve the original order of the contents (to eliminate duplicates, Pig must first
sort the data). You cannot use DISTINCT on a subset of fields. To do this, use FOREACH…GENERATE
to select the fields, and then use DISTINCT (see <ulink url="#nestedblock">Example:
Nested Block</ulink>).</para>
+   </section>
    
    <section>
    <title>Example</title>
@@ -6293,14 +6294,14 @@ C = JOIN A BY name FULL, B BY name USING
                <para>n</para>
             </entry>
             <entry>
-               <para>The number of tuples.</para>
+               <para>The number of output tuples (a constant).</para>
             </entry>
          </row></tbody></tgroup>
    </informaltable></section>
    
    <section>
    <title>Usage</title>
-   <para>Use the LIMIT operator to limit the number of output tuples. If the specified
number of output tuples is equal to or exceeds the number of tuples in the relation, the output
will include all tuples in the relation.</para>
+   <para>Use the LIMIT operator to limit the number of output tuples. If the specified
number of output tuples is equal to or exceeds the number of tuples in the relation, all tuples
in the relation are returned.</para>
    <para>There is no guarantee which tuples will be returned, and the tuples that are
returned can change from one run to the next. A particular set of tuples can be requested
using the ORDER operator followed by LIMIT.</para>
    <para>Note: The LIMIT operator allows Pig to avoid processing all tuples in a relation.
In most cases a query that uses LIMIT will run more efficiently than an identical query that
does not use LIMIT. It is always a good idea to use limit if you can.</para>
    </section>
@@ -6490,7 +6491,7 @@ ILLUSTRATE A;
       <informaltable frame="all">
       <tgroup cols="1"><tbody><row>
             <entry>
-               <para>alias1 = MAPREDUCE 'mr1.jar' [('mr2.jar', ...)] STORE alias2 INTO

+               <para>alias1 = MAPREDUCE 'mr1.jar' STORE alias2 INTO 
 'inputLocation' USING storeFunc LOAD 'outputLocation' USING loadFunc AS schema [`params,
... `];</para>
             </entry>
          </row></tbody></tgroup>
@@ -6514,7 +6515,9 @@ ILLUSTRATE A;
                <para>mr.jar</para>
             </entry>
             <entry>
-               <para>Any MapReduce jar file which can be run through "hadoop jar mymr.jar
params" command. Thus, the contract for inputLocation and outputLocation is typically managed
through params. </para>
+            <para>The MapReduce jar file (enclosed in single quotes).</para>
+               <para>You can specify nny MapReduce jar file that can be run through
the  "hadoop jar mymr.jar params" command. </para>
+               <para>The values for inputLocation and outputLocation can be passed
in the params. </para>
             </entry>
      </row>
 
@@ -6544,7 +6547,7 @@ ILLUSTRATE A;
                <para>'params, ...'</para>
             </entry>
             <entry>
-               <para>Extra parameters required for native MapReduce job. </para>
+               <para>Extra parameters required for native MapReduce job (enclosed in
back tics). </para>
             </entry>
      </row>
       </tbody></tgroup>
@@ -6554,15 +6557,17 @@ ILLUSTRATE A;
 <section>
 <title>Usage</title>
 <para>Use the MAPREDUCE operator to run native MapReduce jobs from inside a Pig script.</para>
+<para>The input and output locations for the MapReduce program are conveyed to Pig
using the STORE/LOAD clauses. Pig, however, does not pass this information (nor require that
this information be passed) to the MapReduce program. If you want to pass the input and output
locations to the MapReduce program you can use the params clause or you can hardcode the locations
in the MapReduce program.</para>
 </section>
 
 <section>
 <title>Example</title>
-<para>This example shows howto run the wordcount MapReduce progam from Pig.
+<para>This example demonstrates how to run the wordcount MapReduce progam from Pig.
 Note that the files specified as input and output locations in the MAPREDUCE statement will
NOT be deleted by Pig automatically. You will need to delete them manually. </para>
 <programlisting>
 A = LOAD 'WordcountInput.txt';
-B = MAPREDUCE wordcount.jar STOE A INTO 'inputDir' LOAD 'outputDir' AS (word:chararray, count:
int) `org.myorg.WordCount inputDir outputDir`;
+B = MAPREDUCE 'wordcount.jar' STOE A INTO 'inputDir' LOAD 'outputDir' 
+    AS (word:chararray, count: int) `org.myorg.WordCount inputDir outputDir`;
 </programlisting>
 </section>
 

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/pigunit.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/pigunit.xml?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/pigunit.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/pigunit.xml Tue Dec 
7 19:23:09 2010
@@ -140,36 +140,27 @@ junit.framework.ComparisonFailure: null 
     </section>
 
     <section>
-      <title>Running in Local Mode</title>
+      <title>Running PigUnit</title>
+      <section>
+			<title>Local Mode</title>	
       <p>
-        Pig runs in local mode by default.
-        Local mode is fast and enables you to use your local file
-        system as the HDFS cluster.
-        Local mode does not require a real cluster but a new local one is
-        created each time. 
+        PigUnit runs in Pig's local mode by default.
+        Local mode is fast and enables you to use your local file system as the HDFS cluster.
+        Local mode does not require a real cluster but a new local one is created each time.

       </p>
     </section>
 
-    <section>
-      <title>Running in Mapreduce Mode</title>
-      <p>Pig also runs in mapreduce mode.
-        This mode requires you to use a Hadoop cluster.
-        The cluster
-        you select must be specified in the CLASSPATH
-        (similar to the HADOOP_CONF_DIR variable).
-      </p>
+      <section>
+			<title>Mapreduce Mode</title>
+      <p>PigUnit also runs in Pig's mapreduce mode.
+        Mapreduce mode requires you to use a Hadoop cluster and HDFS installation.
+        It is enabled when the Java system property pigunit.exectype.cluster is set to any
value:
+      e.g. -Dpigunit.exectype.cluster=true or System.getProperties().setProperty("pigunit.exectype.cluster",
"true").
 
-      <p>Notice that PigUnit comes with a standalone MiniCluster that
-        can be started
-        externally with:
+        The cluster you select must be specified in the CLASSPATH (similar to the HADOOP_CONF_DIR
variable).
       </p>
 
-      <source>
-java -cp .../pig.jar:.../pigunit.jar org.apache.pig.pigunit.MiniClusterRunner
-</source>
-      <p>This is useful when doing some prototyping in order to have a test cluster
-        ready.
-     </p>
+    </section>
     </section>
 
     <section>

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/setup.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/setup.xml?rev=1043169&r1=1043168&r2=1043169&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/setup.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/setup.xml Tue Dec  7
19:23:09 2010
@@ -62,10 +62,10 @@ $ pig 
 		<title>Run Modes</title>
 	        <p>Pig has two run modes or exectypes:  </p>
     <ul>
-      <li><p> Local Mode - To run Pig in local mode, you need access to a single
machine.  </p></li>
-      <li><p> Mapreduce Mode - To run Pig in mapreduce mode, you need access
to a Hadoop cluster and HDFS installation. 
-      Pig will automatically allocate and deallocate a 15-node cluster.</p></li>
+      <li>Local Mode - To run Pig in local mode, you need access to a single machine.
</li>
+      <li>Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop
cluster and HDFS installation.</li>
     </ul>
+    <p></p>
     <p>You can run the Grunt shell, Pig scripts, or embedded programs using either
mode.</p>
     </section> 
 



Mime
View raw message