pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Trivial Update of "PigTutorial" by CorinneC
Date Tue, 10 Jun 2008 22:07:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigTutorial

------------------------------------------------------------------------------
  
   1. Install Java.
   1. Install Pig (using the Pig JAR file).
-  1. Install and run the Pig scripts (using the Pig tutorial file).
+  1. Install and run the Pig scripts (using the Pig tutorial file) - in either local mode
or on a Hadoop cluster.
  
  == Java Installation ==
  Make sure your run-time environment includes the following:
-  1. Java 1.5.x (perferably from Sun)
+  1. Java 1.5.x (preferably from Sun)
   1. The JAVA_HOME environment variable is set the root of your Java installation. 
  
  
@@ -30, +30 @@

   1. Define an environment variable with the location of the Pig JAR file. For example: export
PIGDIR=/home/me/pig (bash, sh) or setenv PIGDIR /home/me/pig (tcsh, csh).
  
  
- == Pig Script Installation: Local Mode ==
+ == Pig Scripts: Local Mode ==
  To install and run the Pig scripts in local mode, do the following:
  
   1. Create a temporary directory. For example: /home/me/tmp
   1. Download and unzip the Pig tutorial file in the temporary directory (''... not available
yet'').
   1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File].
-  1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and the[#Tutorial_Join_Pig_Script
Tutorial-Join Pig Script].
+  1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig Script 2].
-  1. Execute the following command (using either tutorial-local.pig or tutorial-join-local.pig).
+  1. Execute the following command (using either script1-local.pig or script2-local.pig).
  {{{
- $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local tutorial-local.pig
+ $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local script1-local.pig
  }}}
  
-  1.#6 Review the results (either the tutorial-local-results.txt or tutorial-join-local-results.txt
file in your local directory):
+  1.#6 Review the result file (either script1-local-results.txt or script2-local-results.txt):
  {{{
- $ ls -l tutorial-local-results.txt
+ $ ls -l script1-local-results.txt
  }}}
  
  
- == Pig Script Installation: Hadoop Cluster ==
+ == Pig Scripts: Hadoop Cluster ==
  To install and run the Pig scripts on a Hadoop cluster, do the following:
  
   1. Create a temporary directory. For example: /home/me/tmp
   1. Download and unzip the Pig tutorial file in the temporary directory (''... not available
yet'').
   1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File].
-  1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and the[#Tutorial_Join_Pig_Script
Tutorial-Join Pig Script].
+  1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig Script 2].
-  1. Copy the exite.log file from your local directory to your DFS directory. View the file
in your DFS directory.
+  1. Copy the excite.log file from the temporary directory to the DFS directory. View the
file in your DFS directory.
  {{{
  $ hadoop dfs –copyFromLocal excite.log .
  $ hadoop dfs -ls
  }}}
   1.#6 Set the HADOOPSITEPATH environment variable to the location of your hadoop-site.xml
file.
-  1. Execute the following command (using either tutorial.pig or tutorial-join.pig):
+  1. Execute the following command (using either script1-hadoop.pig or script2-hadoop.pig):
  {{{
- $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main tutorial.pig
+ $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main script-1-hadoop.pig
  }}}
-  1.#8 Review the results (the files are located in either your tutorial-results or tutorial-join-results
DFS directory):
+  1.#8 Review the result files (located in either the script1-hadoop-results or script2-hadoop-results
DFS directory):
  {{{
  $ hadoop dfs -ls tutorial-results
  }}}
@@ -75, +75 @@

  The contents of the Pig tutorial file (*.gz) are described here.
  || '''File''' || '''Description'''||
  || tutorial.jar|| User-defined functions (UDFs) ||
+ || script1-local.pig || Query Phrase Popularity Pig script (local mode) ||
+ || script1-hadoop.pig ||Query Phrase Popularity Pig script (Hadoop cluster) ||
+ || script2-local.pig || Temporal Query Phrase Popularity (local mode)||
+ || script2-hadoop.pig || Temporal Query Phrase Popularity (Hadoop cluster) ||
- || tutorial.pig || Tutorial pig script (Hadoop) > creates tutorial-results ||
- || tutorial-local.pig ||Tutorail pig script (local mode) > creates tutorial-local-results.txt
||
- || tutorial-join.pig || Tutorial-join pig script (Hadoop) > creates tutorial-join-results
||
- || tutorial-join-local.pig || Tutorial-join pig script (local mode) > creates tutorial-join-local-results.txt
||
- || excite.log || Data file (Hadoop) ||
  || excite-small.log || Data file (local mode) ||
+ || excite.log || Data file (Hadoop cluster) ||
  || pornwords || Data file (porn keywords) ||
  
  The user-defined functions (UDFs) are described here.
@@ -95, +95 @@

  || !TutorialUtil || Divides the query string into a set of words.||
  
  
- [[Anchor(Tutorial_Pig_Script)]]
+ [[Anchor(Pig_Script_1)]]
- == Tutorial Pig Script ==
+ == Pig Script 1: Query Phrase Popularity ==
  
- The tutorial pig script (tutorial.pig or tutorial-local.pig) does the following:
+ The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a
search query log file from the Excite search engine and finds search phrases that occur with
particular high frequency during certain times of the day.
+ 
+ 
+ The script is shown here:
  
   * Register the tutorial JAR file so that the included UDFs can be called in the script.
  {{{ 
@@ -180, +183 @@

  STORE ordered_uniq_frequency INTO '/tmp/tutorial-results' USING PigStorage(); 
  }}}
  
- [[Anchor(Tutorial_Join_Pig_Script)]]
- == Tutorial-Join Pig Script ==
+ [[Anchor(Pig_Script_2)]]
+ == Pig Script 2: Temporal Query Phrase Popularity ==
+ The Temporal Query Phrase Popularity script (script2-local.pig or script2-hadoop.pig) processes
a search query log file from the Excite search engine and compares the occurrence of frequency
of search phrases across two time periods separated by twelve hours.
  
- The tutorial-join pig script (tutorial-join.pig or tutorial-join-local.pig) does the following:
+ The script is shown here:
  
   * Register the tutorial JAR file so that the user-defined functions (UDFs) can be called
in the script.
  {{{

Mime
View raw message