pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigFunctions" by UtkarshSrivastava
Date Tue, 27 Nov 2007 22:56:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by UtkarshSrivastava:
http://wiki.apache.org/pig/PigFunctions

------------------------------------------------------------------------------
        * ''"Reduce" behavior:'' Recall that in the Pig data model, a tuple may contain fields
of type ''bag''. Hence an Eval Function may perform aggregation or "reducing" by iterating
over a bag of tuples nested within the input tuple. This is how the built-in aggregation function
SUM(...) works, for example.   
     
  The other types of functions are:
-    * '''Filter Function:''' evalutes to True or False when given a tuple; used to eliminate
unwanted tuples from a relation or bag
-    * '''Group Function:''' assigns tuples to group(s) 
     * '''Load Function:''' controls reading of tuples from files
     * '''Store Function:''' controls storing of tuples to files
  
  [[Anchor(Example)]]
  ==== Example ====
- The following example uses each of the five types of functions. It computes the set of unique
IP addresses associated with "good" products drawn from a list of products found on the web.
+ The following example uses each of the types of functions. It computes the set of unique
IP addresses associated with "good" products drawn from a list of products found on the web.
  
  {{{
  register myFunctions.jar
  products = LOAD '/productlist.txt' USING MyListStorage() AS (name, price, description, url);
- goodProducts = FILTER products BY (price <= '19.99' AND MyFilter(description));
+ goodProducts = FILTER products BY (price <= '19.99');
  hostnames = FOREACH goodProducts GENERATE MyHostExtractor(url) AS hostname;
  uniqueIPs = FOREACH (GROUP hostnames BY MyIPLookup(hostname)) GENERATE group AS ipAddress;
  STORE uniqueIPs INTO '/iplist.txt' USING MyListStorage();
  }}}
  
- In the above example, !MyListStorage() serves as a load function as well as a store function;
!MyFilter() is a filter function; !MyHostExtractor() is an eval function; MyIPLookup() is
a group function. `myFunctions.jar` is a jar file that contains the classes for the user-defined
functions.
+ In the above example, !MyListStorage() serves as a load function as well as a store function;
!MyHostExtractor() and !MyIPLookup() are eval functions. `myFunctions.jar` is a jar file that
contains the classes for the user-defined functions.
  
  [[Anchor(How_to_write_functions)]]
  === How to write functions ===
@@ -41, +39 @@

  
  Click below to learn how to build your own:
     * EvalFunction
-    * FilterFunction
-    * GroupFunction
-    * StorageFunction (These are the most difficult to write, and usually, the inbuilt ones
should be enough)
+    * Load/Store Function (These are the most difficult to write, and usually, the inbuilt
ones should be enough)
  
  [[Anchor(Ok,_I_have_written_my_function,_how_to_use_it?)]]
  === Ok, I have written my function, how to use it? ===

Mime
View raw message