tajo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jihoon...@apache.org
Subject svn commit: r1676659 - in /tajo/site/docs/devel: _sources/functions/python.txt functions/python.html
Date Wed, 29 Apr 2015 02:50:11 GMT
Author: jihoonson
Date: Wed Apr 29 02:50:11 2015
New Revision: 1676659

URL: http://svn.apache.org/r1676659
Log:
Add missing files

Added:
    tajo/site/docs/devel/_sources/functions/python.txt
    tajo/site/docs/devel/functions/python.html

Added: tajo/site/docs/devel/_sources/functions/python.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/functions/python.txt?rev=1676659&view=auto
==============================================================================
--- tajo/site/docs/devel/_sources/functions/python.txt (added)
+++ tajo/site/docs/devel/_sources/functions/python.txt Wed Apr 29 02:50:11 2015
@@ -0,0 +1,159 @@
+******************************
+Python Functions
+******************************
+
+=======================
+User-defined Functions
+=======================
+
+-----------------------
+Function registration
+-----------------------
+
+To register Python UDFs, you must install script files in all cluster nodes.
+After that, you can register your functions by specifying the paths to those script files
in ``tajo-site.xml``. Here is an example of the configuration.
+
+.. code-block:: xml
+
+  <property>
+    <name>tajo.function.python.code-dir</name>
+    <value>/path/to/script1.py,/path/to/script2.py</value>
+  </property>
+
+Please note that you can specify multiple paths with ``','`` as a delimiter. Each file can
contain multiple functions. Here is a typical example of a script file.
+
+.. code-block:: python
+
+  # /path/to/udf1.py
+
+  @output_type('int4')
+  def return_one():
+    return 1
+
+  @output_type("text")
+  def helloworld():
+    return 'Hello, World'
+
+  # No decorator - blob
+  def concat_py(str):
+    return str+str
+
+  @output_type('int4')
+  def sum_py(a,b):
+    return a+b
+
+If the configuration is set properly, every function in the script files are registered when
the Tajo cluster starts up.
+
+-----------------------
+Decorators and types
+-----------------------
+
+By default, every function has a return type of ``BLOB``.
+You can use Python decorators to define output types for the script functions. Tajo can figure
out return types from the annotations of the Python script.
+
+* ``output_type``: Defines the return data type for a script UDF in a format that Tajo can
understand. The defined type must be one of the types supported by Tajo. For supported types,
please refer to :doc:`/sql_language/data_model`.
+
+-----------------------
+Query example
+-----------------------
+
+Once the Python UDFs are successfully registered, you can use them as other built-in functions.
+
+.. code-block:: sql
+
+  default> select concat_py(n_name)::text from nation where sum_py(n_regionkey,1) >
2;
+
+==============================================
+User-defined Aggregation Functions
+==============================================
+
+-----------------------
+Function registration
+-----------------------
+
+To define your Python aggregation functions, you should write Python classes for each function.
+Followings are typical examples of Python UDAFs.
+
+.. code-block:: python
+
+  # /path/to/udaf1.py
+
+  class AvgPy:
+    sum = 0
+    cnt = 0
+
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.sum = 0
+        self.cnt = 0
+
+    # eval at the first stage
+    def eval(self, item):
+        self.sum += item
+        self.cnt += 1
+
+    # get intermediate result
+    def get_partial_result(self):
+        return [self.sum, self.cnt]
+
+    # merge intermediate results
+    def merge(self, list):
+        self.sum += list[0]
+        self.cnt += list[1]
+
+    # get final result
+    @output_type('float8')
+    def get_final_result(self):
+        return self.sum / float(self.cnt)
+
+
+  class CountPy:
+    cnt = 0
+
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.cnt = 0
+
+    # eval at the first stage
+    def eval(self):
+        self.cnt += 1
+
+    # get intermediate result
+    def get_partial_result(self):
+        return self.cnt
+
+    # merge intermediate results
+    def merge(self, cnt):
+        self.cnt += cnt
+
+    # get final result
+    @output_type('int4')
+    def get_final_result(self):
+        return self.cnt
+
+
+These classes must provide ``reset()``, ``eval()``, ``merge()``, ``get_partial_result()``,
and ``get_final_result()`` functions.
+
+* ``reset()`` resets the aggregation state.
+* ``eval()`` evaluates input tuples in the first stage.
+* ``merge()`` merges intermediate results of the first stage.
+* ``get_partial_result()`` returns intermediate results of the first stage. Output type must
be same with the input type of ``merge()``.
+* ``get_final_result()`` returns the final aggregation result.
+
+-----------------------
+Query example
+-----------------------
+
+Once the Python UDAFs are successfully registered, you can use them as other built-in aggregation
functions.
+
+.. code-block:: sql
+
+  default> select avgpy(n_nationkey), countpy() from nation;
+
+.. warning::
+
+  Currently, Python UDAFs cannot be used as window functions.
\ No newline at end of file

Added: tajo/site/docs/devel/functions/python.html
URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/functions/python.html?rev=1676659&view=auto
==============================================================================
--- tajo/site/docs/devel/functions/python.html (added)
+++ tajo/site/docs/devel/functions/python.html Wed Apr 29 02:50:11 2015
@@ -0,0 +1,404 @@
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Python Functions &mdash; Apache Tajo 0.11.0 documentation</title>
+  
+
+  
+  
+
+  
+  <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700'
rel='stylesheet' type='text/css'>
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+    <link rel="top" title="Apache Tajo 0.11.0 documentation" href="../index.html"/>
+        <link rel="up" title="Functions" href="../functions.html"/>
+        <link rel="next" title="Table Management" href="../table_management.html"/>
+        <link rel="prev" title="JSON Functions" href="json_func.html"/> 
+
+  
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/modernizr/2.6.2/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-nav-search">
+        <a href="../index.html" class="fa fa-home"> Apache Tajo</a>
+        <div role="search">
+  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+      </div>
+
+      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main
navigation">
+        
+        
+            <ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="../introduction.html">Introduction</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../getting_started.html">Getting
Started</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getting_started.html#prerequisites">Prerequisites</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../getting_started.html#dowload-and-unpack-the-source-code">Dowload
and unpack the source code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../getting_started.html#build-source-code">Build
source code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../getting_started.html#setting-up-a-local-tajo-cluster">Setting
up a local Tajo cluster</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../getting_started.html#first-query-execution">First
query execution</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../configuration.html">Configuration</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/preliminary.html">Preliminary</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/cluster_setup.html">Cluster
Setup</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/tajo_master_configuration.html">Tajo
Master Configuration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/worker_configuration.html">Worker
Configuration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/catalog_configuration.html">Catalog
Configuration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/ha_configuration.html">High
Availability for TajoMaster</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/service_config_defaults.html">Cluster
Service Configuration Defaults</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/tajo-site-xml.html">The
tajo-site.xml File</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../configuration/catalog-site-xml.html">The
catalog-site.xml File</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../tsql.html">Tajo
Shell (TSQL)</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/meta_command.html">Meta
Commands</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/dfs_command.html">Executing
HDFS commands</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/variables.html">Session
Variables</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/admin_command.html">Administration
Commands</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/intro.html">Introducing
to TSQL</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/single_command.html">Executing
a single command</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/execute_file.html">Executing
Queries from Files</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tsql/background_command.html">Executing
as background process</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../sql_language.html">SQL
Language</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/data_model.html">Data
Model</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/ddl.html">Data
Definition Language</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/insert.html">INSERT
(OVERWRITE) INTO</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/queries.html">Queries</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/sql_expression.html">SQL
Expressions</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../sql_language/predicates.html">Predicates</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../time_zone.html">Time
Zone</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../time_zone.html#server-cluster-time-zone">Server
Cluster Time Zone</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../time_zone.html#table-time-zone">Table
Time Zone</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../time_zone.html#client-time-zone">Client
Time Zone</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../time_zone.html#time-zone-id">Time
Zone ID</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../time_zone.html#examples-of-time-zone">Examples
of Time Zone</a></li>
+</ul>
+</li>
+<li class="toctree-l1 current"><a class="reference internal" href="../functions.html">Functions</a><ul
class="current">
+<li class="toctree-l2"><a class="reference internal" href="../functions.html#built-in-functions">Built-in
Functions</a></li>
+<li class="toctree-l2 current"><a class="reference internal" href="../functions.html#user-defined-functions">User-defined
Functions</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../table_management.html">Table
Management</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../table_management/table_overview.html">Overview
of Tajo Tables</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../table_management/file_formats.html">File
Formats</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../table_management/compression.html">Compression</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../table_partitioning.html">Table
Partitioning</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../partitioning/intro_to_partitioning.html">Introduction
to Partitioning</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../partitioning/column_partitioning.html">Column
Partitioning</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../partitioning/range_partitioning.html">Range
Partitioning</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../partitioning/hash_partitioning.html">Hash
Partitioning</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../index_overview.html">Index
(Experimental Feature)</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../index/types.html">Index
Types</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../index/how_to_use.html">How
to use index?</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../index/future_work.html">Future
Works</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../backup_and_restore.html">Backup
and Restore</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../backup_and_restore/catalog.html">Backup
and Restore Catalog</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../hive_integration.html">Hive
Integration</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../hbase_integration.html">HBase
Integration</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../hbase_integration.html#create-table">CREATE
TABLE</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../hbase_integration.html#drop-table">DROP
TABLE</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../hbase_integration.html#insert-overwrite-into">INSERT
(OVERWRITE) INTO</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../hbase_integration.html#usage">Usage</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../swift_integration.html">OpenStack
Swift Integration</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../swift_integration.html#swift-configuration">Swift
configuration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../swift_integration.html#hadoop-configurations">Hadoop
configurations</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../swift_integration.html#tajo-configuration">Tajo
configuration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../swift_integration.html#querying-on-swift">Querying
on Swift</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../jdbc_driver.html">Tajo
JDBC Driver</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../jdbc_driver.html#how-to-get-jdbc-driver">How
to get JDBC driver</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../jdbc_driver.html#setting-the-classpath">Setting
the CLASSPATH</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../jdbc_driver.html#an-example-jdbc-client">An
Example JDBC Client</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../tajo_client_api.html">Tajo
Client API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq.html">FAQ</a></li>
+</ul>
+
+        
+      </div>
+      &nbsp;
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="../index.html">Apache Tajo</a>
+      </nav>
+
+
+      
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="../index.html">Docs</a> &raquo;</li>
+      
+          <li><a href="../functions.html">Functions</a> &raquo;</li>
+      
+    <li>Python Functions</li>
+      <li class="wy-breadcrumbs-aside">
+        
+          <a href="../_sources/functions/python.txt" rel="nofollow"> View page source</a>
+        
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main">
+            
+  <div class="section" id="python-functions">
+<h1>Python Functions<a class="headerlink" href="#python-functions" title="Permalink
to this headline">¶</a></h1>
+<div class="section" id="user-defined-functions">
+<h2>User-defined Functions<a class="headerlink" href="#user-defined-functions" title="Permalink
to this headline">¶</a></h2>
+<div class="section" id="function-registration">
+<h3>Function registration<a class="headerlink" href="#function-registration" title="Permalink
to this headline">¶</a></h3>
+<p>To register Python UDFs, you must install script files in all cluster nodes.
+After that, you can register your functions by specifying the paths to those script files
in <tt class="docutils literal"><span class="pre">tajo-site.xml</span></tt>.
Here is an example of the configuration.</p>
+<div class="highlight-xml"><div class="highlight"><pre><span class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>tajo.function.python.code-dir<span
class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>/path/to/script1.py,/path/to/script2.py<span
class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+</pre></div>
+</div>
+<p>Please note that you can specify multiple paths with <tt class="docutils literal"><span
class="pre">','</span></tt> as a delimiter. Each file can contain multiple
functions. Here is a typical example of a script file.</p>
+<div class="highlight-python"><div class="highlight"><pre><span class="c">#
/path/to/udf1.py</span>
+
+<span class="nd">@output_type</span><span class="p">(</span><span
class="s">&#39;int4&#39;</span><span class="p">)</span>
+<span class="k">def</span> <span class="nf">return_one</span><span
class="p">():</span>
+  <span class="k">return</span> <span class="mi">1</span>
+
+<span class="nd">@output_type</span><span class="p">(</span><span
class="s">&quot;text&quot;</span><span class="p">)</span>
+<span class="k">def</span> <span class="nf">helloworld</span><span
class="p">():</span>
+  <span class="k">return</span> <span class="s">&#39;Hello, World&#39;</span>
+
+<span class="c"># No decorator - blob</span>
+<span class="k">def</span> <span class="nf">concat_py</span><span
class="p">(</span><span class="nb">str</span><span class="p">):</span>
+  <span class="k">return</span> <span class="nb">str</span><span
class="o">+</span><span class="nb">str</span>
+
+<span class="nd">@output_type</span><span class="p">(</span><span
class="s">&#39;int4&#39;</span><span class="p">)</span>
+<span class="k">def</span> <span class="nf">sum_py</span><span
class="p">(</span><span class="n">a</span><span class="p">,</span><span
class="n">b</span><span class="p">):</span>
+  <span class="k">return</span> <span class="n">a</span><span
class="o">+</span><span class="n">b</span>
+</pre></div>
+</div>
+<p>If the configuration is set properly, every function in the script files are registered
when the Tajo cluster starts up.</p>
+</div>
+<div class="section" id="decorators-and-types">
+<h3>Decorators and types<a class="headerlink" href="#decorators-and-types" title="Permalink
to this headline">¶</a></h3>
+<p>By default, every function has a return type of <tt class="docutils literal"><span
class="pre">BLOB</span></tt>.
+You can use Python decorators to define output types for the script functions. Tajo can figure
out return types from the annotations of the Python script.</p>
+<ul class="simple">
+<li><tt class="docutils literal"><span class="pre">output_type</span></tt>:
Defines the return data type for a script UDF in a format that Tajo can understand. The defined
type must be one of the types supported by Tajo. For supported types, please refer to <a
class="reference internal" href="../sql_language/data_model.html"><em>Data Model</em></a>.</li>
+</ul>
+</div>
+<div class="section" id="query-example">
+<h3>Query example<a class="headerlink" href="#query-example" title="Permalink to
this headline">¶</a></h3>
+<p>Once the Python UDFs are successfully registered, you can use them as other built-in
functions.</p>
+<div class="highlight-sql"><div class="highlight"><pre><span class="k">default</span><span
class="o">&gt;</span> <span class="k">select</span> <span class="n">concat_py</span><span
class="p">(</span><span class="n">n_name</span><span class="p">)::</span><span
class="nb">text</span> <span class="k">from</span> <span class="n">nation</span>
<span class="k">where</span> <span class="n">sum_py</span><span
class="p">(</span><span class="n">n_regionkey</span><span class="p">,</span><span
class="mi">1</span><span class="p">)</span> <span class="o">&gt;</span>
<span class="mi">2</span><span class="p">;</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="user-defined-aggregation-functions">
+<h2>User-defined Aggregation Functions<a class="headerlink" href="#user-defined-aggregation-functions"
title="Permalink to this headline">¶</a></h2>
+<div class="section" id="id1">
+<h3>Function registration<a class="headerlink" href="#id1" title="Permalink to this
headline">¶</a></h3>
+<p>To define your Python aggregation functions, you should write Python classes for
each function.
+Followings are typical examples of Python UDAFs.</p>
+<div class="highlight-python"><div class="highlight"><pre><span class="c">#
/path/to/udaf1.py</span>
+
+<span class="k">class</span> <span class="nc">AvgPy</span><span
class="p">:</span>
+  <span class="nb">sum</span> <span class="o">=</span> <span class="mi">0</span>
+  <span class="n">cnt</span> <span class="o">=</span> <span class="mi">0</span>
+
+  <span class="k">def</span> <span class="nf">__init__</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">reset</span><span class="p">()</span>
+
+  <span class="k">def</span> <span class="nf">reset</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">sum</span> <span class="o">=</span> <span class="mi">0</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">=</span> <span class="mi">0</span>
+
+  <span class="c"># eval at the first stage</span>
+  <span class="k">def</span> <span class="nf">eval</span><span
class="p">(</span><span class="bp">self</span><span class="p">,</span>
<span class="n">item</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">sum</span> <span class="o">+=</span> <span class="n">item</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">+=</span> <span class="mi">1</span>
+
+  <span class="c"># get intermediate result</span>
+  <span class="k">def</span> <span class="nf">get_partial_result</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="k">return</span> <span class="p">[</span><span
class="bp">self</span><span class="o">.</span><span class="n">sum</span><span
class="p">,</span> <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span><span class="p">]</span>
+
+  <span class="c"># merge intermediate results</span>
+  <span class="k">def</span> <span class="nf">merge</span><span
class="p">(</span><span class="bp">self</span><span class="p">,</span>
<span class="nb">list</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">sum</span> <span class="o">+=</span> <span class="nb">list</span><span
class="p">[</span><span class="mi">0</span><span class="p">]</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">+=</span> <span class="nb">list</span><span
class="p">[</span><span class="mi">1</span><span class="p">]</span>
+
+  <span class="c"># get final result</span>
+  <span class="nd">@output_type</span><span class="p">(</span><span
class="s">&#39;float8&#39;</span><span class="p">)</span>
+  <span class="k">def</span> <span class="nf">get_final_result</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="k">return</span> <span class="bp">self</span><span
class="o">.</span><span class="n">sum</span> <span class="o">/</span>
<span class="nb">float</span><span class="p">(</span><span class="bp">self</span><span
class="o">.</span><span class="n">cnt</span><span class="p">)</span>
+
+
+<span class="k">class</span> <span class="nc">CountPy</span><span
class="p">:</span>
+  <span class="n">cnt</span> <span class="o">=</span> <span class="mi">0</span>
+
+  <span class="k">def</span> <span class="nf">__init__</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">reset</span><span class="p">()</span>
+
+  <span class="k">def</span> <span class="nf">reset</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">=</span> <span class="mi">0</span>
+
+  <span class="c"># eval at the first stage</span>
+  <span class="k">def</span> <span class="nf">eval</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">+=</span> <span class="mi">1</span>
+
+  <span class="c"># get intermediate result</span>
+  <span class="k">def</span> <span class="nf">get_partial_result</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="k">return</span> <span class="bp">self</span><span
class="o">.</span><span class="n">cnt</span>
+
+  <span class="c"># merge intermediate results</span>
+  <span class="k">def</span> <span class="nf">merge</span><span
class="p">(</span><span class="bp">self</span><span class="p">,</span>
<span class="n">cnt</span><span class="p">):</span>
+      <span class="bp">self</span><span class="o">.</span><span
class="n">cnt</span> <span class="o">+=</span> <span class="n">cnt</span>
+
+  <span class="c"># get final result</span>
+  <span class="nd">@output_type</span><span class="p">(</span><span
class="s">&#39;int4&#39;</span><span class="p">)</span>
+  <span class="k">def</span> <span class="nf">get_final_result</span><span
class="p">(</span><span class="bp">self</span><span class="p">):</span>
+      <span class="k">return</span> <span class="bp">self</span><span
class="o">.</span><span class="n">cnt</span>
+</pre></div>
+</div>
+<p>These classes must provide <tt class="docutils literal"><span class="pre">reset()</span></tt>,
<tt class="docutils literal"><span class="pre">eval()</span></tt>,
<tt class="docutils literal"><span class="pre">merge()</span></tt>,
<tt class="docutils literal"><span class="pre">get_partial_result()</span></tt>,
and <tt class="docutils literal"><span class="pre">get_final_result()</span></tt>
functions.</p>
+<ul class="simple">
+<li><tt class="docutils literal"><span class="pre">reset()</span></tt>
resets the aggregation state.</li>
+<li><tt class="docutils literal"><span class="pre">eval()</span></tt>
evaluates input tuples in the first stage.</li>
+<li><tt class="docutils literal"><span class="pre">merge()</span></tt>
merges intermediate results of the first stage.</li>
+<li><tt class="docutils literal"><span class="pre">get_partial_result()</span></tt>
returns intermediate results of the first stage. Output type must be same with the input type
of <tt class="docutils literal"><span class="pre">merge()</span></tt>.</li>
+<li><tt class="docutils literal"><span class="pre">get_final_result()</span></tt>
returns the final aggregation result.</li>
+</ul>
+</div>
+<div class="section" id="id2">
+<h3>Query example<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h3>
+<p>Once the Python UDAFs are successfully registered, you can use them as other built-in
aggregation functions.</p>
+<div class="highlight-sql"><div class="highlight"><pre><span class="k">default</span><span
class="o">&gt;</span> <span class="k">select</span> <span class="n">avgpy</span><span
class="p">(</span><span class="n">n_nationkey</span><span class="p">),</span>
<span class="n">countpy</span><span class="p">()</span> <span class="k">from</span>
<span class="n">nation</span><span class="p">;</span>
+</pre></div>
+</div>
+<div class="admonition warning">
+<p class="first admonition-title">Warning</p>
+<p class="last">Currently, Python UDAFs cannot be used as window functions.</p>
+</div>
+</div>
+</div>
+</div>
+
+
+          </div>
+          <footer>
+  
+    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
+      
+        <a href="../table_management.html" class="btn btn-neutral float-right" title="Table
Management"/>Next <span class="fa fa-arrow-circle-right"></span></a>
+      
+      
+        <a href="json_func.html" class="btn btn-neutral" title="JSON Functions"><span
class="fa fa-arrow-circle-left"></span> Previous</a>
+      
+    </div>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2014, Apache Tajo Team.
+    </p>
+  </div>
+
+  <a href="https://github.com/snide/sphinx_rtd_theme">Sphinx theme</a> provided
by <a href="https://readthedocs.org">Read the Docs</a>
+</footer>
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'0.11.0',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+
+  
+  
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.StickyNav.enable();
+      });
+  </script>
+   
+
+</body>
+</html>
\ No newline at end of file



Mime
View raw message