pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "UDFsUsingScriptingLanguages" by Aniket Mokashi
Date Fri, 13 Aug 2010 19:05:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "UDFsUsingScriptingLanguages" page has been changed by Aniket Mokashi.
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages?action=diff&rev1=2&rev2=3

--------------------------------------------------

  {{{
  Register 'test.py' using jython as myfuncs;
  }}}
- This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the python script.
Users can use custom script engines to support multiple languages and ways to interpret them.
Currently, pig identifies jython as a keyword and ships the required scriptengine (jython)
to interpret it.
+ This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the python script.
Users can develop and use custom script engines to support multiple programming languages
and ways to interpret them. Currently, pig identifies jython as a keyword and ships the required
scriptengine (jython) to interpret it.
  
  Following syntax is also supported -
  {{{
@@ -52, +52 @@

  }}}
  Registering test.py with pig makes under myfuncs namespace creates functions - myfuncs.helloworld(),
myfuncs.complex(2), myfuncs.square(2.0) available as UDFs. These UDFs can be used with
  {{{
- b = foreach a generate myfuncs.helloworld, myfuncs.square(3);
+ b = foreach a generate myfuncs.helloworld(), myfuncs.square(3);
  }}}
  
  === Decorators and Schemas ===
- For annotating python script so that pig can identify their return types, we use decorators
to define output schema for a script UDF. 
+ For annotating python script so that pig can identify their return types, we use python
decorators to define output schema for a script UDF.
   '''outputSchema''' defines schema for a script udf in a format that pig understands and
is able to parse. 
   
   '''outputFunctionSchema''' defines a script delegate function that defines schema for this
function depending upon the input type. This is needed for functions that can accept generic
types and perform generic operations on these types. A simple example is ''square'' which
can accept multiple types. SchemaFunction for this type is a simple identity function (same
schema as input).
   
   '''schemaFunction''' defines delegate function and is not registered to pig.
- 
   
- When no decorator is specified, pig assumes the output datatype as bytearray and converts
the output generated by script function to bytearray. This is consistent with pig's behavior
in other cases. 
+ When no decorator is specified, pig assumes the output datatype as bytearray and converts
the output generated by script function to bytearray. This is consistent with pig's behavior
in case of Java UDFs.
- 
- ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names are not used
anywhere they are just to make syntax consistent.
+ ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names inside schema
string are not used anywhere, they are used just to make syntax identifiable to the parser.
  
  == Inline Scripts ==
+ As of today, Pig doesn't support UDFs using inline scripts. This feature is being tracked
at [[#ref4|PIG-1471]].
+ 
+ == Sample Script UDFs ==
+ Simple tasks like string manipulation, mathematical computations, reorganizing data types
can be easily done using python scripts without having to develop long and complex UDFs in
Java. The overall overhead of using scripting language is much less and development cost is
almost negligible. Following are a few examples of UDFs developed in python that can be used
with Pig.
+ {{{
+ mySampleLib.py
+ ---------------------
+ #!/usr/bin/python
+ 
+ ##################
+ # Math functions #
+ ##################
+ #Square - Square of a number of any data type
+ @outputSchemaFunction("squareSchema")
+ def square(num):
+   return ((num)*(num))
+ @schemaFunction("squareSchema")
+ def squareSchema(input):
+   return input
+ 
+ #Percent- Percentage
+ @outputSchema("t:(percent:double)")
+ def percent(num, total):
+   return num * 100 / total
+ 
+ #CommaFormat-
+ @outputSchema("t:(numformat:chararray)")
+ def commaFormat(num):
+   return '{:,}'.format(num)
+ 
+ ####################
+ # String Functions #
+ ####################
+ 
+ 
+ #######################
+ # Data Type Functions #
+ #######################
+ 
+ 
+ }}}
  
  == Performance ==
  === Jython ===
@@ -78, +117 @@

   1. <<Anchor(ref1)>> PIG-928, "UDFs in scripting languages", https://issues.apache.org/jira/browse/PIG-928
   2. <<Anchor(ref2)>> Jython, "The jython project", http://www.jython.org/
   3. <<Anchor(ref3)>> Jruby, "100% pure-java implementation of ruby programming
language", http://jruby.org/
+  4. <<Anchor(ref4)>> PIG-1471, "inline UDFs in scripting languages", https://issues.apache.org/jira/browse/PIG-1471
  

Mime
View raw message