hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject incubator-hawq-docs git commit: HAWQ-1216 - clean up plpython docs (Closes #77)
Date Thu, 05 Jan 2017 23:38:38 GMT
Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 87ff8368c -> c9cb72d6a


HAWQ-1216 - clean up plpython docs (Closes #77)


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/c9cb72d6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/c9cb72d6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/c9cb72d6

Branch: refs/heads/develop
Commit: c9cb72d6a14aefbae9a94e26783d8da6e2a7c1e6
Parents: 87ff836
Author: Lisa Owen <lowen@pivotal.io>
Authored: Thu Jan 5 15:38:30 2017 -0800
Committer: David Yozie <yozie@apache.org>
Committed: Thu Jan 5 15:38:30 2017 -0800

----------------------------------------------------------------------
 plext/using_plpython.html.md.erb | 914 +++++++++++++++++++++-------------
 1 file changed, 564 insertions(+), 350 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/c9cb72d6/plext/using_plpython.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpython.html.md.erb b/plext/using_plpython.html.md.erb
index ff7b527..063509a 100644
--- a/plext/using_plpython.html.md.erb
+++ b/plext/using_plpython.html.md.erb
@@ -2,374 +2,608 @@
 title: Using PL/Python in HAWQ
 ---
 
-This section contains an overview of the HAWQ PL/Python language extension.
+This section provides an overview of the HAWQ PL/Python procedural language extension.
 
 ## <a id="abouthawqplpython"></a>About HAWQ PL/Python 
 
-PL/Python is a loadable procedural language. With the HAWQ PL/Python extension, you can write HAWQ user-defined functions in Python that take advantage of Python features and modules to quickly build robust database applications.
+PL/Python is embedded in your HAWQ product distribution or within your HAWQ build if you chose to enable it as a build option. 
+
+With the HAWQ PL/Python extension, you can write user-defined functions in Python that take advantage of Python features and modules, enabling you to quickly build robust HAWQ database applications.
 
 HAWQ uses the system Python installation.
 
 ### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations 
 
-- HAWQ does not support PL/Python triggers.
+- HAWQ does not support PL/Python trigger functions.
 - PL/Python is available only as a HAWQ untrusted language.
  
 ## <a id="enableplpython"></a>Enabling and Removing PL/Python Support 
 
-To use PL/Python in HAWQ, you must either use a pre-compiled version of HAWQ that includes PL/Python or specify PL/Python as a build option when compiling HAWQ.
+To use PL/Python in HAWQ, you must either install a binary version of HAWQ that includes PL/Python or specify PL/Python as a build option when you compile HAWQ from source.
+
+You must register the PL/Python language with a database before you can create and execute a PL/Python UDF on that database. You must be a database superuser to register and remove new languages in HAWQ databases.
+
+On every database to which you want to install and enable PL/Python:
+
+1. Connect to the database using the `psql` client:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d <dbname>
+    ```
+
+    Replace \<dbname\> with the name of the target database.
+
+2. Run the following SQL command to register the PL/Python procedural language:
+
+    ``` sql
+    dbname=# CREATE LANGUAGE plpythonu;
+    ```
 
-To create and run a PL/Python user-defined function (UDF) in a database, you must register the PL/Python language with the database. On every database where you want to install and enable PL/Python, connect to the database using the `psql` client.
+    **Note**: `plpythonu` is installed as an *untrusted* language; it offers no way of restricting what you can program in UDFs created with the language. Creating and executing PL/Python UDFs is permitted only by database superusers and other database users explicitly `GRANT`ed the permissions.
 
-```shell
-$ psql -d <dbname>
+To remove support for `plpythonu` from a database, run the following SQL command; you must be a database superuser to remove a registered procedural language:
+
+``` sql
+dbname=# DROP LANGUAGE plpythonu;
 ```
 
-Replace \<dbname\> with the name of the target database.
+## <a id="developfunctions"></a>Developing Functions with PL/Python 
+
+PL/Python functions are defined using the standard SQL [CREATE FUNCTION](../reference/sql/CREATE-FUNCTION.html) syntax.
+
+The body of a PL/Python user-defined function is a Python script. When the function is called, its arguments are passed as elements of the array `args[]`. You can also pass named arguments as ordinary variables to the Python script. 
 
-Then, run the following SQL command:
+PL/Python function results are returned with a `return` statement, or a `yield` statement in the case of a result-set statement.
 
-```shell
-psql# CREATE LANGUAGE plpythonu;
+The following PL/Python function computes and returns the maximum of two integers:
+
+``` sql
+=# CREATE FUNCTION mypymax (a integer, b integer)
+     RETURNS integer
+   AS $$
+     if (a is None) or (b is None):
+       return None
+     if a > b:
+       return a
+     return b
+   $$ LANGUAGE plpythonu;
 ```
 
-Note that `plpythonu` is installed as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it.
+To execute the `mypymax` function:
 
-To remove support for `plpythonu` from a database, run the following SQL command:
+``` sql
+=# SELECT mypymax(5, 7);
+ mypymax 
+---------
+       7
+(1 row)
+```
+
+Adding the `STRICT` keyword to the `LANGUAGE` subclause instructs HAWQ to return null when any of the input arguments are null. When created as `STRICT`, the function itself need not perform null checks.
 
-```shell
-psql# DROP LANGUAGE plpythonu;
+The following example uses an unnamed argument, the built-in Python `max()` function, and the `STRICT` keyword to create a UDF named `mypymax2`:
+
+``` sql
+=# CREATE FUNCTION mypymax2 (a integer, integer)
+     RETURNS integer AS $$ 
+   return max(a, args[0]) 
+   $$ LANGUAGE plpythonu STRICT;
+=# SELECT mypymax(5, 3);
+ mypymax2
+----------
+        5
+(1 row)
+=# SELECT mypymax(5, null);
+ mypymax2
+----------
+       
+(1 row)
 ```
 
-## <a id="developfunctions"></a>Developing Functions with PL/Python 
+## <a id="example_createtbl"></a>Creating the Sample Data
 
-The body of a PL/Python user-defined function is a Python script. When the function is called, its arguments are passed as elements of the array `args[]`. Named arguments are also passed as ordinary variables to the Python script. The result is returned from the PL/Python function with return statement, or yield statement in case of a result-set statement.
+Perform the following steps to create, and insert data into, a simple table. This table will be used in later exercises.
 
-The HAWQ PL/Python language module imports the Python module `plpy`. The module `plpy` implements these functions:
+1. Create a database named `testdb`:
 
-- Functions to execute SQL queries and prepare execution plans for queries.
-   - `plpy.execute`
-   - `plpy.prepare`
-   
-- Functions to manage errors and messages.
-   - `plpy.debug`
-   - `plpy.log`
-   - `plpy.info`
-   - `plpy.notice`
-   - `plpy.warning`
-   - `plpy.error`
-   - `plpy.fatal`
-   - `plpy.debug`
+    ``` shell
+    gpadmin@hawq-node$ createdb testdb
+    ```
+
+1. Create a table named `sales`:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d testdb
+    ```
+    ``` sql
+    testdb=> CREATE TABLE sales (id int, year int, qtr int, day int, region text)
+               DISTRIBUTED BY (id);
+    ```
+
+2. Insert data into the table:
+
+    ``` sql
+    testdb=> INSERT INTO sales VALUES
+     (1, 2014, 1,1, 'usa'),
+     (2, 2002, 2,2, 'europe'),
+     (3, 2014, 3,3, 'asia'),
+     (4, 2014, 4,4, 'usa'),
+     (5, 2014, 1,5, 'europe'),
+     (6, 2014, 2,6, 'asia'),
+     (7, 2002, 3,7, 'usa') ;
+    ```
+
+## <a id="pymod_intro"></a>Python Modules 
+A Python module is a text file containing Python statements and definitions. Python modules are named, with the file name for a module following the `<python-module-name>.py` naming convention.
+
+Should you need to build a Python module, ensure that the appropriate software is installed on the build system. Also be sure that you are building for the correct deployment architecture, i.e. 64-bit.
+
+### <a id="pymod_intro_hawq"></a>HAWQ Considerations 
+
+When installing a Python module in HAWQ, you must add the module to all segment nodes in the cluster. You must also add all Python modules to any new segment hosts when you expand your HAWQ cluster.
+
+PL/Python supports the built-in HAWQ Python module named `plpy`.  You can also install 3rd party Python modules.
+
+
+## <a id="modules_plpy"></a>plpy Module 
+
+The HAWQ PL/Python procedural language extension automatically imports the Python module `plpy`. `plpy` implements functions to execute SQL queries and prepare execution plans for queries.  The `plpy` module also includes functions to manage errors and messages.
    
-## <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
+### <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
 
-The PL/Python `plpy` module provides two Python functions to execute an SQL query and prepare an execution plan for a query, `plpy.execute` and `plpy.prepare`. Preparing the execution plan for a query is useful if you run the query from multiple Python functions.
+Use the PL/Python `plpy` module `plpy.execute()` function to execute a SQL query. Use the `plpy.prepare()` function to prepare an execution plan for a query. Preparing the execution plan for a query is useful if you want to run the query from multiple Python functions.
 
-### <a id="plpyexecute"></a>plpy.execute 
+#### <a id="plpyexecute"></a>plpy.execute() 
 
-Calling `plpy.execute` with a query string and an optional limit argument causes the query to be run and the result to be returned in a Python result object. The result object emulates a list or dictionary object. The rows returned in the result object can be accessed by row number and column name. The result set row numbering starts with 0 (zero). The result object can be modified. The result object has these additional methods:
+Invoking `plpy.execute()` with a query string and an optional limit argument runs the query, returning the result in a Python result object. This result object:
 
-- `nrows` that returns the number of rows returned by the query.
-- `status` which is the `SPI_execute()` return value.
+- emulates a list or dictionary object
+- returns rows that can be accessed by row number and column name; row numbering starts with 0 (zero)
+- can be modified
+- includes an `nrows()` method that returns the number of rows returned by the query
+- includes a `status()` method that returns the `SPI_execute()` return value
 
-For example, this Python statement in a PL/Python user-defined function executes a query.
+For example, the following Python statement when present in a PL/Python user-defined function will execute a `SELECT * FROM mytable` query:
 
-```python
-rv = plpy.execute("SELECT * FROM my_table", 5)
+``` python
+rv = plpy.execute("SELECT * FROM my_table", 3)
 ```
 
-The `plpy.execute` function returns up to 5 rows from `my_table`. The result set is stored in the `rv` object. If `my_table` has a column `my_column`, it would be accessed as:
+As instructed by the limit argument `3`, the `plpy.execute` function will return up to 3 rows from `my_table`. The result set is stored in the `rv` object.
+
+Access specific columns in the table by name. For example, if `my_table` has a column named `my_column`:
 
-```python
+``` python
 my_col_data = rv[i]["my_column"]
 ```
 
-Since the function returns a maximum of 5 rows, the index `i` can be an integer between 0 and 4.
+You specified that the function return a maximum of 3 rows in the `plpy.execute()` command above. As such, the index `i` used to access the result value `rv` must specify an integer between 0 and 2, inclusive.
+
+##### <a id="plpyexecute_example"></a>Example: plpy.execute()
+
+Example: Use `plpy.execute()` to run a similar query on the `sales` table you created in an earlier section:
+
+1. Define a PL/Python UDF that executes a query to return at most 5 rows from the `sales` table:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest(a integer) 
+         RETURNS text 
+       AS $$ 
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         region = rv[a-1]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
+
+    When executed, this UDF returns the `region` value from the `id` identified by the input value `a`. Since row numbering of the result set starts at 0, you must access the result set with index `a - 1`. 
+    
+    Specifying the `ORDER BY id` clause in the `SELECT` statement ensures that subsequent invocations of `mypytest` with the same input argument will return identical result sets.
+
+3. Run `mypytest` with an argument identifying `id` `3`:
+
+    ```sql
+    =# SELECT mypytest(3);
+     mypytest 
+    ----------
+     asia
+    (1 row)
+    ```
+    
+    Recall that the row numbering starts from 0 in a Python returned result set. The valid input argument for the `mypytest2` function is an integer between 0 and 4, inclusive.
+
+    The query returns the `region` from the row with `id = 3`, `asia`.
+    
+Note: This example demonstrates some of the concepts discussed previously. It may not be the ideal way to return a specific column value.
 
-### <a id="plpyprepare"></a>plpy.prepare 
+#### <a id="plpyprepare"></a>plpy.prepare() 
 
-The function `plpy.prepare` prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query. For example, this statement can be in a PL/Python user-defined function:
+The function `plpy.prepare()` prepares the execution plan for a query. Preparing the execution plan for a query is useful if you plan to run the query from multiple Python functions.
 
-```python
-plan = plpy.prepare("SELECT last_name FROM my_users WHERE 
-  first_name = $1", [ "text" ])
+You invoke `plpy.prepare()` with a query string. Also include a list of parameter types if you are using parameter references in the query. For example, the following statement in a PL/Python user-defined function returns the execution plan for a query:
+
+``` python
+plan = plpy.prepare("SELECT * FROM sales ORDER BY id WHERE 
+  region = $1", [ "text" ])
 ```
 
-The string text is the data type of the variable that is passed for the variable `$1`. After preparing a statement, you use the function `plpy.execute` to run it:
+The string `text` identifies the data type of the variable `$1`. 
+
+After preparing an execution plan, you use the function `plpy.execute()` to run it.  For example:
 
-```python
-rv = plpy.execute(plan, [ "Fred" ], 5)
+``` python
+rv = plpy.execute(plan, [ "usa" ])
 ```
 
-The third argument is the limit for the number of rows returned and is optional.
+When executed, `rv` will include all rows in the `sales` table where `region = usa`.
+
+Read on for a description of how one passes data between PL/Python function calls.
+
+##### <a id="plpyprepare_dictionaries"></a>Saving Execution Plans
 
-When you prepare an execution plan using the PL/Python module the plan is automatically saved. See the Postgres Server Programming Interface (SPI) documentation for information about the execution plans [http://www.postgresql.org/docs/8.2/static/spi.html](http://www.postgresql.org/docs/8.2/static/spi.html).
+When you prepare an execution plan using the PL/Python module, the plan is automatically saved. See the [Postgres Server Programming Interface (SPI)](http://www.postgresql.org/docs/8.2/static/spi.html) documentation for information about execution plans.
 
-To make effective use of saved plans across function calls you use one of the Python persistent storage dictionaries SD or GD.
+To make effective use of saved plans across function calls, you use one of the Python persistent storage dictionaries, SD or GD.
 
-The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use GD with care.
+The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, and is available to all Python functions within a session. *Use GD with care*.
 
-Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from myfunc are not available to `myfunc2`. The exception is the data in the GD dictionary, as mentioned previously.
+Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from `myfunc1` are not available to `myfunc2`. The exception is the data in the GD dictionary, as mentioned previously.
 
-This example uses the SD dictionary:
+This example saves an execution plan to the SD dictionary and then executes the plan:
 
 ```sql
-CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
-  if SD.has_key("plan"):
-    plan = SD["plan"]
-  else:
-    plan = plpy.prepare("SELECT 1")
-    SD["plan"] = plan
+=# CREATE FUNCTION usesavedplan() RETURNS text AS $$
+     select1plan = plpy.prepare("SELECT region FROM sales WHERE id=1")
+     SD["s1plan"] = select1plan
+     # other function processing
+     # execute the saved plan
+     rv = plpy.execute(SD["s1plan"])
+     return rv[0]["region"]
+   $$ LANGUAGE plpythonu;
+=# SELECT usesavedplan();
+```
+
+##### <a id="plpyprepare_example"></a>Example: plpy.prepare()
+
+Example: Use `plpy.prepare()` and `plpy.execute()` to prepare and run an execution plan using the GD dictionary:
+
+1. Define a PL/Python UDF to prepare and save an execution plan to the GD. Also  return the name of the plan:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_prepplan() 
+         RETURNS text 
+       AS $$ 
+         plan = plpy.prepare("SELECT * FROM sales WHERE region = $1 ORDER BY id", [ "text" ])
+         GD["getregionplan"] = plan
+         return "getregionplan"
+       $$ LANGUAGE plpythonu;
+    ```
+
+    This UDF, when run, will return the name (key) of the execution plan generated from the `plpy.prepare()` call.
+
+1. Define a PL/Python UDF to run the execution plan; this function will take the plan name and `region` name as an input:
 
-  # rest of function
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_execplan(planname text, regionname text)
+         RETURNS integer 
+       AS $$ 
+         rv = plpy.execute(GD[planname], [ regionname ], 5)
+         year = rv[0]["year"]
+         return year
+       $$ LANGUAGE plpythonu STRICT;
+    ```
 
-$$ LANGUAGE plpythonu;
-```
+    This UDF executes the `planname` plan that was previously saved to the GD. You will call `mypy_execplan()` with the `planname` returned from the `plpy.prepare()` call.
 
-## <a id="pythonerrors"></a>Handling Python Errors and Messages 
+3. Execute the `mypy_prepplan()` and `mypy_execplan()` UDFs, passing `region` `usa`:
 
-The message functions `plpy.error` and `plpy.fatal` raise a Python exception which, if uncaught, propagates out to the calling query, causing the current transaction or subtransaction to be aborted. The functions raise `plpy.ERROR(msg)` and raise `plpy.FATAL(msg)` are equivalent to calling `plpy.error` and `plpy.fatal`, respectively. The other message functions only generate messages of different priority levels.
+    ``` sql
+    =# SELECT mypy_execplan( mypy_prepplan(), 'usa' );
+     mypy_execplan
+    ---------------
+         2014
+    (1 row)
+    ```
 
-Whether messages of a particular priority are reported to the client, written to the server log, or both is controlled by the HAWQ server configuration parameters `log_min_messages` and `client_min_messages`. For information about the parameters, see the [Server Configuration Parameter Reference](../reference/HAWQSiteConfig.html).
+### <a id="pythonerrors"></a>Handling Python Errors and Messages 
 
-## <a id="dictionarygd"></a>Using the Dictionary GD to Improve PL/Python Performance 
+The `plpy` module implements the following message- and error-related functions, each of which takes a message string as an argument:
 
-In terms of performance, importing a Python module is an expensive operation and can affect performance. If you are importing the same module frequently, you can use Python global variables to load the module on the first invocation and not require importing the module on subsequent calls. The following PL/Python function uses the GD persistent storage dictionary to avoid importing a module if it has already been imported and is in the GD.
+- `plpy.debug(msg)`
+- `plpy.log(msg)`
+- `plpy.info(msg)`
+- `plpy.notice(msg)`
+- `plpy.warning(msg)`
+- `plpy.error(msg)`
+- `plpy.fatal(msg)`
 
-```sql
-psql=#
-   CREATE FUNCTION pytest() RETURNS text AS $$ 
-      if 'mymodule' not in GD:
-        import mymodule
-        GD['mymodule'] = mymodule
-    return GD['mymodule'].sumd([1,2,3])
-$$ LANGUAGE plpythonu;
-```
+`plpy.error()` and `plpy.fatal()` raise a Python exception which, if uncaught, propagates out to the calling query, possibly aborting the current transaction or subtransaction. `raise plpy.ERROR(msg)` and `raise plpy.FATAL(msg)` are equivalent to calling `plpy.error()` and `plpy.fatal()`, respectively. Use the other message functions to generate messages of different priority levels.
 
-## <a id="installpythonmodules"></a>Installing Python Modules 
+Messages may be reported to the client and/or written to the HAWQ server log file.  The HAWQ server configuration parameters [`log_min_messages`](../reference/guc/parameter_definitions.html#log_min_messages) and [`client_min_messages`](../reference/guc/parameter_definitions.html#client_min_messages) control where messages are reported.
 
-HAWQ is configured to use the system Python. When you install a Python module on HAWQ, you must add the module to all segment hosts in the cluster. When expanding HAWQ, you must add Python modules to the new segment hosts. You can use the HAWQ utilities `hawq ssh` and `hawq scp` to run commands on HAWQ hosts and copy files to the hosts. For information about the utilities, see the [HAWQ Management Tools Reference](../reference/cli/management_tools.html).
- 
-If you are building a Python module, you must ensure that the build creates the correct executable. For example on a Linux system, the build should create a 64-bit executable.
+#### <a id="plpymessages_example"></a>Example: Generating Messages
 
-Before building a Python module prior to installation, ensure that the appropriate software to build the module is installed and properly configured. The build environment is required only on the host where you build the module.
+In this example, you will create a PL/Python UDF that includes some debug log messages. You will also configure your `psql` session to enable debug-level client logging.
 
-These are examples of installing and testing Python modules:
+1. Define a PL/Python UDF that executes a query that will return at most 5 rows from the `sales` table. Invoke the `plpy.debug()` method to display some additional information:
 
-- Simple Python Module Installation Example (setuptools)
-- Complex Python Installation Example (NumPy)
-- Testing Installed Python Modules
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest_debug(a integer) 
+         RETURNS text 
+       AS $$ 
+         plpy.debug('mypytest_debug executing query:  SELECT * FROM sales ORDER BY id')
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         plpy.debug('mypytest_debug: query returned ' + str(rv.nrows()) + ' rows')
+         region = rv[a]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
 
-### <a id="simpleinstall"></a>Simple Python Module Installation Example (setuptools) 
+2. Execute the `mypytest_debug()` UDF, passing the integer `2` as an argument:
 
-This example manually installs the Python `setuptools` module from the Python Package Index repository. The module lets you easily download, build, install, upgrade, and uninstall Python packages.
+    ```sql
+    =# SELECT mypytest_debug(2);
+     mypytest_debug 
+    ----------------
+     asia
+    (1 row)
+    ```
 
-This example first builds the module from a package and installs the module on a single host. Then the module is built and installed on segment hosts.
+3. Enable `DEBUG2` level client logging:
 
-Get the module package from the Python Package Index site. For example, run this `wget` command on a HAWQ host as the gpadmin user to get the tar.gz file.
+    ``` sql
+    =# SET client_min_messages=DEBUG2;
+    ```
+    
+2. Execute the `mypytest_debug()` UDF again:
 
-```bash
-$ wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
-```
+    ```sql
+    =# SELECT mypytest_debug(2);
+    ...
+    DEBUG2:  mypytest_debug executing query:  SELECT * FROM sales ORDER BY id
+    ...
+    DEBUG2:  mypytest_debug: query returned 5 rows
+    ...
+    ```
 
-Extract the files from the tar.gz file.
+    Debug output is very verbose. You will parse a lot of output to find the `mypytest_debug` messages. *Hint*: look both near the start and end of the output.
+    
+6. Turn off client-level debug logging:
 
-```bash
-$ tar -xzvf setuptools-18.4.tar.gz
-```
+    ```sql
+    =# SET client_min_messages=NOTICE;
+    ```
 
-Go to the directory that contains the package files, and run the Python scripts to build and install the Python package.
+## <a id="pythonmodules-3rdparty"></a>3rd-Party Python Modules 
 
-```bash
-$ cd setuptools-18.4
-$ python setup.py build && python setup.py install
-```
+PL/Python supports installation and use of 3rd-party Python Modules. This section includes examples for installing the `setuptools` and NumPy Python modules.
 
-The following Python command returns no errors if the module is available to Python.
+**Note**: You must have superuser privileges to install Python modules to the system Python directories.
 
-```bash
-$ python -c "import setuptools"
-```
+### <a id="simpleinstall"></a>Example: Installing setuptools 
 
-Copy the package to the HAWQ hosts with the `hawq scp` utility. For example, this command copies the tar.gz file from the current host to the host systems listed in the file `hawq-hosts`.
+In this example, you will manually install the Python `setuptools` module from the Python Package Index repository. `setuptools` enables you to easily download, build, install, upgrade, and uninstall Python packages.
 
-```bash
-$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz =:/home/gpadmin
-```
+You will first build the module from the downloaded package, installing it on a single host. You will then build and install the module on all segment nodes in your HAWQ cluster.
 
-Run the commands to build, install, and test the package with `hawq ssh` utility on the hosts listed in the file `hawq-hosts`. The file `hawq-hosts` lists all the remote HAWQ segment hosts:
+1. Download the `setuptools` module package from the Python Package Index site. For example, run this `wget` command on a HAWQ node as the `gpadmin` user:
 
-```bash
-$ hawq ssh -f hawq-hosts
->>> tar -xzvf setuptools-18.4.tar.gz
->>> cd setuptools-18.4
->>> python setup.py build && python setup.py install
->>> python -c "import setuptools"
->>> exit
-```
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ . /usr/local/hawq/greenplum_path.sh
+    gpadmin@hawq-node$ mkdir plpython_pkgs
+    gpadmin@hawq-node$ cd plpython_pkgs
+    gpadmin@hawq-node$ export PLPYPKGDIR=`pwd`
+    gpadmin@hawq-node$ wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
+    ```
 
-The `setuptools` package installs the `easy_install` utility that lets you install Python packages from the Python Package Index repository. For example, this command installs Python PIP utility from the Python Package Index site.
+2. Extract the files from the `tar.gz` package:
 
-```shell
-$ cd setuptools-18.4
-$ easy_install pip
-```
+    ``` shell
+    gpadmin@hawq-node$ tar -xzvf setuptools-18.4.tar.gz
+    ```
 
-You can use the `hawq ssh` utility to run the `easy_install` command on all the HAWQ segment hosts.
+3. Run the Python scripts to build and install the Python package; you must have superuser privileges to install Python modules to the system Python installation:
 
-### <a id="complexinstall"></a>Complex Python Installation Example (NumPy) 
+    ``` shell
+    gpadmin@hawq-node$ cd setuptools-18.4
+    gpadmin@hawq-node$ python setup.py build 
+    gpadmin@hawq-node$ sudo python setup.py install
+    ```
 
-This example builds and installs the Python module NumPy. NumPy is a module for scientific computing with Python. For information about NumPy, see [http://www.numpy.org/](http://www.numpy.org/).
+4. Run the following command to verify the module is available to Python:
 
-Building the NumPy package requires this software:
-- OpenBLAS libraries, an open source implementation of BLAS (Basic Linear Algebra Subprograms).
-- The gcc compilers: gcc, gcc-gfortran, and gcc-c++. The compilers are required to build the OpenBLAS libraries. See [OpenBLAS Prerequisites](#openblasprereq).
+    ``` shell
+    gpadmin@hawq-node$ python -c "import setuptools"
+    ```
+    
+    If no error is returned, the `setuptools` module was successfully imported.
 
-This example process assumes `yum` is installed on all HAWQ segment hosts and the `gpadmin` user is a member of `sudoers` with `root` privileges on the hosts.
+5. The `setuptools` package installs the `easy_install` utility. This utility enables you to install Python packages from the Python Package Index repository. For example, this command installs the Python `pip` utility from the Python Package Index site:
 
-Download the OpenBLAS and NumPy source files. For example, these `wget` commands download tar.gz files into the directory packages:
+    ``` shell
+    gpadmin@hawq-node$ sudo easy_install pip
+    ```
 
-```bash
-$ wget --directory-prefix=packages http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
-$ wget --directory-prefix=packages http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
-```
+5. Copy the `setuptools` package to all HAWQ nodes in your cluster. For example, this command copies the `tar.gz` file from the current host to the host systems listed in the file `hawq-hosts`:
 
-Distribute the software to the HAWQ hosts. For example, if you download the software to `/home/gpadmin/packages`, these commands create the directory on the hosts and copies the software to hosts for the hosts listed in the `hawq-hosts` file.
+    ``` shell
+    gpadmin@hawq-node$ cd $PLPYPKGDIR
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz =:/home/gpadmin
+    ```
 
-```bash
-$ hawq ssh -f hawq-hosts mkdir packages 
-$ hawq scp -f hawq-hosts packages/* =:/home/gpadmin/packages
-```
+6. Run the commands to build, install, and test the `setuptools` package you just copied to all hosts in your HAWQ cluster. For example:
 
-#### <a id="openblasprereq"></a>OpenBLAS Prerequisites 
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts
+    >>> mkdir plpython_pkgs
+    >>> cd plpython_pkgs
+    >>> tar -xzvf ../setuptools-18.4.tar.gz
+    >>> cd setuptools-18.4
+    >>> python setup.py build 
+    >>> sudo python setup.py install
+    >>> python -c "import setuptools"
+    >>> exit
+    ```
 
-1. If needed, use `yum` to install gcc compilers from system repositories. The compilers are required on all hosts where you compile OpenBLAS:
+### <a id="complexinstall"></a>Example: Installing NumPy 
 
-	```bash
-	$ sudo yum -y install gcc gcc-gfortran gcc-c++
-	```
+In this example, you will build and install the Python module NumPy. NumPy is a module for scientific computing with Python. For additional information about NumPy, refer to [http://www.numpy.org/](http://www.numpy.org/).
 
-	**Note:** If you cannot install the correct compiler versions with `yum`, you can download the gcc compilers, including gfortran, from source and install them.
+This example assumes `yum` is installed on all HAWQ segment nodes and that the `gpadmin` user is a member of `sudoers` with `root` privileges on the nodes.
 
-	These two commands download and install the compilers:
+#### <a id="complexinstall_prereq"></a>Prerequisites
+Building the NumPy package requires the following software:
 
-	```bash
-	$ wget http://gfortran.com/download/x86_64/snapshots/gcc-4.4.tar.xz
-	$ tar xf gcc-4.4.tar.xz -C /usr/local/
-	```
+- OpenBLAS libraries - an open source implementation of BLAS (Basic Linear Algebra Subprograms)
+- Python development packages - python-devel
+- gcc compilers - gcc, gcc-gfortran, and gcc-c++
+
+Perform the following steps to set up the OpenBLAS compilation environment on each HAWQ node:
 
-	If you installed `gcc` manually from a tar file, add the new `gcc` binaries to `PATH` and `LD_LIBRARY_PATH`:
+1. Use `yum` to install gcc compilers from system repositories. The compilers are required on all hosts where you compile OpenBLAS.  For example:
 
-	```bash
-	$ export PATH=$PATH:/usr/local/gcc-4.4/bin
-	$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/gcc-4.4/lib
+	``` shell
+	root@hawq-node$ yum -y install gcc gcc-gfortran gcc-c++ python-devel
 	```
 
-2. Create a symbolic link to `g++` and call it `gxx`:
+2. (Optionally required) If you cannot install the correct compiler versions with `yum`, you have the option to download the gcc compilers, including `gfortran`, from source and build and install them manually. Refer to [Building gfortran from Source](https://gcc.gnu.org/wiki/GFortranBinaries#FromSource) for `gfortran` build and install information.
 
-	```bash
-	$ sudo ln -s /usr/bin/g++ /usr/bin/gxx
+2. Create a symbolic link to `g++`, naming it `gxx`:
+
+	``` bash
+	root@hawq-node$ ln -s /usr/bin/g++ /usr/bin/gxx
 	```
 
-3. You might also need to create symbolic links to any libraries that have different versions available for example `libppl_c.so.4` to `libppl_c.so.2`.
+3. You may also need to create symbolic links to any libraries that have different versions available; for example, linking `libppl_c.so.4` to `libppl_c.so.2`.
 
-4. If needed, you can use the `hawq scp` utility to copy files to HAWQ hosts and the `hawq ssh` utility to run commands on the hosts.
+4. You can use the `hawq scp` utility to copy files to HAWQ hosts and the `hawq ssh` utility to run commands on those hosts.
 
-#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
 
-Before build and install the NumPy module, you install the OpenBLAS libraries. This section describes how to build and install the libraries on a single host.
+#### <a id="complexinstall_downdist"></a>Obtaining Packages
 
-1. Extract the OpenBLAS files from the file. These commands extract the files from the OpenBLAS tar file and simplify the directory name that contains the OpenBLAS files.
+Perform the following steps to download and distribute the OpenBLAS and NumPy source packages:
 
-	```bash
-	$ tar -xzf packages/v0.2.8 -C /home/gpadmin/packages
-	$ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf /home/gpadmin/packages/	OpenBLAS
-	```
+1. Download the OpenBLAS and NumPy source files. For example, these `wget` commands download `tar.gz` files into a `packages` directory in the current working directory:
 
-2. Compile OpenBLAS. These commands set the LIBRARY_PATH environment variable and run the make command to build OpenBLAS libraries.
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ wget --directory-prefix=packages http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
+    gpadmin@hawq-node$ wget --directory-prefix=packages http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
+    ```
 
-	```bash
-	$ cd /home/gpadmin/packages/OpenBLAS
-	$ export LIBRARY_PATH=$LD_LIBRARY_PATH
-	$ make FC=gfortran USE_THREAD=0
-	```
+2. Distribute the software to all nodes in your HAWQ cluster. For example, if you downloaded the software to `/home/gpadmin/packages`, these commands create the `packages` directory on all nodes and copies the software to the nodes listed in the `hawq-hosts` file:
 
-3. Use these commands to install the OpenBLAS libraries in `/usr/local` as `root`, and then change the owner of the files to `gpadmin`.
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts mkdir packages 
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts packages/* =:/home/gpadmin/packages
+    ```
+
+#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
 
-	```bash
-	$ cd /home/gpadmin/packages/OpenBLAS/
-	$ sudo make PREFIX=/usr/local install
-	$ sudo ldconfig
-	$ sudo chown -R gpadmin /usr/local/lib
+Before building and installing the NumPy module, you must first build and install the OpenBLAS libraries. This section describes how to build and install the libraries on a single HAWQ node.
+
+1. Extract the OpenBLAS files from the file:
+
+	``` shell
+	$ ssh gpadmin@<hawq-node>
+	gpadmin@hawq-node$ cd packages
+	gpadmin@hawq-node$ tar xzf v0.2.8 -C /home/gpadmin/packages
+	gpadmin@hawq-node$ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf /home/gpadmin/packages/OpenBLAS
 	```
+	
+	These commands extract the OpenBLAS tar file and simplify the unpacked directory name.
 
-	The following libraries are installed, along with symbolic links:
+2. Compile OpenBLAS. You must set the `LIBRARY_PATH` environment variable to the current `$LD_LIBRARY_PATH`. For example:
 
-	```bash
-	libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
-	libopenblas_sandybridge-r0.2.8.a
-	libopenblas_sandybridge-r0.2.8.so
-	libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
-	libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+	``` shell
+	gpadmin@hawq-node$ cd OpenBLAS
+	gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+	gpadmin@hawq-node$ make FC=gfortran USE_THREAD=0 TARGET=SANDYBRIDGE
 	```
+	
+	Replace the `TARGET` argument with the target appropriate for your hardware. The `TargetList.txt` file identifies the list of supported OpenBLAS targets.
+	
+	Compiling OpenBLAS make take some time.
 
-4. You can use the `hawq ssh` utility to build and install the OpenBLAS libraries on multiple hosts.
+3. Install the OpenBLAS libraries in `/usr/local` and then change the owner of the files to `gpadmin`. You must have `root` privileges. For example:
 
-	All HAWQ hosts (master and segment hosts) have identical configurations. You can copy the OpenBLAS libraries from the system where they were built instead of building the OpenBlas libraries on all the hosts. For example, these `hawq ssh` and `hawq scp` commands copy and install the OpenBLAS libraries on the hosts listed in the `hawq-hosts` file.
+	``` shell
+	gpadmin@hawq-node$ sudo make PREFIX=/usr/local install
+	gpadmin@hawq-node$ sudo ldconfig
+	gpadmin@hawq-node$ sudo chown -R gpadmin /usr/local/lib
+	```
 
-```bash
-$ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++'
-$ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
-$ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
-$ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
-```
-```bash
-$ hawq ssh -f hawq-hosts
->>> cd /usr/local/lib
->>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
->>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
->>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
->>> sudo ldconfig
-```
+	The following libraries are installed to `/usr/local/lib`, along with symbolic links:
+
+	``` shell
+	gpadmin@hawq-node$ ls -l gpadmin@hawq-node$
+	    ...
+	    libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
+	    libopenblas_sandybridge-r0.2.8.a
+	    libopenblas_sandybridge-r0.2.8.so
+	    libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
+	    libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+	    ...
+	```
+
+4. Install the OpenBLAS libraries on all nodes in your HAWQ cluster. You can use the `hawq ssh` utility to similarly build and install the OpenBLAS libraries on each of the nodes. 
+
+    Or, you may choose to copy the OpenBLAS libraries you just built to all of the HAWQ cluster nodes. For example, these `hawq ssh` and `hawq scp` commands install prerequisite packages, and copy and install the OpenBLAS libraries on the hosts listed in the `hawq-hosts` file.
+
+    ``` shell
+    $ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++ python-devel'
+    $ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
+    $ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
+    $ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
+    ```
+    ``` shell
+    $ hawq ssh -f hawq-hosts
+    >>> cd /usr/local/lib
+    >>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
+    >>> sudo ldconfig
+   ```
 
 #### Build and Install NumPy <a name="buildinstallnumpy"></a>
 
 After you have installed the OpenBLAS libraries, you can build and install NumPy module. These steps install the NumPy module on a single host. You can use the `hawq ssh` utility to build and install the NumPy module on multiple hosts.
 
-1. Go to the packages subdirectory and get the NumPy module source and extract the files.
+1. Extract the NumPy module source files:
 
-	```bash
-	$ cd /home/gpadmin/packages
-	$ tar -xzf numpy-1.8.0.tar.gz
+	``` shell
+	gpadmin@hawq-node$ cd /home/gpadmin/packages
+	gpadmin@hawq-node$ tar xzf numpy-1.8.0.tar.gz
 	```
+	
+	Unpacking the `numpy-1.8.0.tar.gz` file creates a directory named `numpy-1.8.0` in the current directory.
 
-2. Set up the environment for building and installing NumPy.
+2. Set up the environment for building and installing NumPy:
 
-	```bash
-	$ export BLAS=/usr/local/lib/libopenblas.a
-	$ export LAPACK=/usr/local/lib/libopenblas.a
-	$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
-	$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+	``` shell
+	gpadmin@hawq-node$ export BLAS=/usr/local/lib/libopenblas.a
+	gpadmin@hawq-node$ export LAPACK=/usr/local/lib/libopenblas.a
+	gpadmin@hawq-node$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+	gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
 	```
 
-3. Go to the NumPy directory and build and install NumPy. Building the NumPy package might take some time.
+3. Build and install NumPy. (Building the NumPy package might take some time.)
 
-	```bash
-	$ cd numpy-1.8.0
-	$ python setup.py build
-	$ python setup.py install
+	``` shell
+	gpadmin@hawq-node$ cd numpy-1.8.0
+	gpadmin@hawq-node$ python setup.py build
+	gpadmin@hawq-node$ sudo python setup.py install
 	```
 
-	**Note:** If the NumPy module did not successfully build, the NumPy build process might need a site.cfg that specifies the location of the OpenBLAS libraries. Create the file `site.cfg` in the NumPy package directory:
+	**Note:** If the NumPy module did not successfully build, the NumPy build process might need a `site.cfg` file that specifies the location of the OpenBLAS libraries. Create the `site.cfg` file in the NumPy package directory:
 
-	```bash
-	$ cd ~/packages/numpy-1.8.0
-	$ touch site.cfg
+	``` shell
+	gpadmin@hawq-node$ touch site.cfg
 	```
 
 	Add the following to the `site.cfg` file and run the NumPy build command again:
 
-	<pre>
+	``` pre
 	[default]
 	library_dirs = /usr/local/lib
 
@@ -386,19 +620,22 @@ After you have installed the OpenBLAS libraries, you can build and install NumPy
 	libraries = openblas
 	library_dirs = /usr/local/lib
 	include_dirs = /usr/local/include
-	</pre>
+	```
 
-4. The following Python command ensures that the module is available for import by Python on a host system.
+4. Verify that the NumPy module is available for import by Python:
 
-	```bash
-	$ python -c "import numpy"
+	``` shell
+	gpadmin@hawq-node$ cd $HOME
+	gpadmin@hawq-node$ python -c "import numpy"
 	```
+	
+	If no error is returned, the NumPy module was successfully imported.
 
-5. Similar to the simple module installation, use the `hawq ssh` utility to build, install, and test the module on HAWQ segment hosts.
+5. As performed in the `setuptools` Python module installation, use the `hawq ssh` utility to build, install, and test the NumPy module on all HAWQ nodes.
 
-5. The environment variables that are require to build the NumPy module are also required in the gpadmin user environment when running Python NumPy functions. You can use the `hawq ssh` utility with the `echo` command to add the environment variables to the `.bashrc` file. For example, these echo commands add the environment variables to the `.bashrc` file in the user home directory.
+5. The environment variables that were required to build the NumPy module are also required in the `gpadmin` runtime environment to run Python NumPy functions. You can use the `echo` command to add the environment variables to `gpadmin`'s `.bashrc` file. For example, the following `echo` commands add the environment variables to the `.bashrc` file in `gpadmin`'s home directory:
 
-	```bash
+	``` shell
 	$ echo -e '\n#Needed for NumPy' >> ~/.bashrc
 	$ echo -e 'export BLAS=/usr/local/lib/libopenblas.a' >> ~/.bashrc
 	$ echo -e 'export LAPACK=/usr/local/lib/libopenblas.a' >> ~/.bashrc
@@ -406,136 +643,111 @@ After you have installed the OpenBLAS libraries, you can build and install NumPy
 	$ echo -e 'export LIBRARY_PATH=$LD_LIBRARY_PATH' >> ~/.bashrc
 	```
 
-## <a id="testingpythonmodules"></a>Testing Installed Python Modules 
+    You can use the `hawq ssh` utility with these `echo` commands to add the environment variables to the `.bashrc` file on all nodes in your HAWQ cluster.
 
-You can create a simple PL/Python user-defined function (UDF) to validate that Python a module is available in HAWQ. This example tests the NumPy module.
+### <a id="testingpythonmodules"></a>Testing Installed Python Modules 
 
-This PL/Python UDF imports the NumPy module. The function returns SUCCESS if the module is imported, and FAILURE if an import error occurs.
+You can create a simple PL/Python user-defined function (UDF) to validate that a Python module is available in HAWQ. This example tests the NumPy module.
 
-```sql
-CREATE OR REPLACE FUNCTION plpy_test(x int)
-RETURNS text
-AS $$
-  try:
-      from numpy import *
-      return 'SUCCESS'
-  except ImportError, e:
-      return 'FAILURE'
-$$ language plpythonu;
-```
+1. Create a PL/Python UDF that imports the NumPy module:
 
-Create a table that contains data on each HAWQ segment instance. Depending on the size of your HAWQ installation, you might need to generate more data to ensure data is distributed to all segment instances.
+    ``` shell
+    gpadmin@hawq_node$ psql -d testdb
+    ```
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION test_importnumpy(x int)
+       RETURNS text
+       AS $$
+         try:
+             from numpy import *
+             return 'SUCCESS'
+         except ImportError, e:
+             return 'FAILURE'
+       $$ LANGUAGE plpythonu;
+    ```
 
-```sql
-CREATE TABLE DIST AS (SELECT x FROM generate_series(1,50) x ) DISTRIBUTED RANDOMLY ;
-```
+    The function returns SUCCESS if the module is imported, and FAILURE if an import error occurs.
 
-This SELECT command runs the UDF on the segment hosts where data is stored in the primary segment instances.
+2. Create a table that loads data on each HAWQ segment instance:
 
-```sql
-SELECT gp_segment_id, plpy_test(x) AS status
-  FROM dist
-  GROUP BY gp_segment_id, status
-  ORDER BY gp_segment_id, status;
-```
+    ``` sql
+    => CREATE TABLE disttbl AS (SELECT x FROM generate_series(1,50) x ) DISTRIBUTED BY (x);
+    ```
+    
+    Depending upon the size of your HAWQ installation, you may need to generate a larger series to ensure data is distributed to all segment instances.
 
-The SELECT command returns SUCCESS if the UDF imported the Python module on the HAWQ segment instance. If the SELECT command returns FAILURE, you can find the segment host of the segment instance host. The HAWQ system table `gp_segment_configuration` contains information about segment configuration. This command returns the host name for a segment ID.
+3. Run the UDF on the segment nodes where data is stored in the primary segment instances.
 
-```sql
-SELECT hostname, content AS seg_ID FROM gp_segment_configuration
-  WHERE content = seg_id ;
-```
+    ``` sql
+    =# SELECT gp_segment_id, test_importnumpy(1) AS status
+         FROM disttbl
+         GROUP BY gp_segment_id, status
+         ORDER BY gp_segment_id, status;
+    ```
 
-If FAILURE is returned, these are some possible causes:
+    The `SELECT` command returns SUCCESS if the UDF imported the Python module on the HAWQ segment instance. FAILURE is returned if the Python module could not be imported.
+   
+
+#### <a id="testingpythonmodules"></a>Troubleshooting Python Module Import Failures
+
+Possible causes of a Python module import failure include:
 
 - A problem accessing required libraries. For the NumPy example, HAWQ might have a problem accessing the OpenBLAS libraries or the Python libraries on a segment host.
 
-	Make sure you get no errors when running command on the segment host as the gpadmin user. This hawq ssh command tests importing the NumPy module on the segment host mdw1.
+	*Try*: Test importing the module on the segment host. This `hawq ssh` command tests importing the NumPy module on the segment host named mdw1.
 
-	```shell
-	$ hawq ssh -h mdw1 python -c "import numpy"
+	``` shell
+	gpadmin@hawq-node$ hawq ssh -h mdw1 python -c "import numpy"
 	```
 
-- If the Python import command does not return an error, environment variables might not be configured in the HAWQ environment. For example, the variables are not in the `.bashrc` file, or HAWQ might not have been restarted after adding the environment variables to the `.bashrc` file.
+- Environment variables may not be configured in the HAWQ environment. The Python import command may not return an error in this case.
 
-	Ensure sure that the environment variables are properly set and then restart HAWQ. For the NumPy example, ensure the environment variables listed at the end of the section [Build and Install NumPy](#buildinstallnumpy) are defined in the `.bashrc` file for the gpadmin user on the master and segment hosts.
+	*Try*: Ensure that the environment variables are properly set. For the NumPy example, ensure that the environment variables listed at the end of the section [Build and Install NumPy](#buildinstallnumpy) are defined in the `.bashrc` file for the `gpadmin` user on the master and all segment nodes.
+	
+	**Note:** The `.bashrc` file for the `gpadmin` user on the HAWQ master and all segment nodes must source the `greenplum_path.sh` file.
 
-	**Note:** On HAWQ master and segment hosts, the `.bashrc` file for the gpadmin user must source the file `$GPHOME/greenplum_path.sh`.
+	
+- HAWQ might not have been restarted after adding environment variable settings to the `.bashrc` file. Again, the Python import command may not return an error in this case.
 
-## <a id="examples"></a>Examples 
+	*Try*: Ensure that you have restarted HAWQ.
+	
+	``` shell
+	gpadmin@master$ hawq restart cluster
+	```
 
-This PL/Python UDF returns the maximum of two integers:
+## <a id="dictionarygd"></a>Using the GD Dictionary to Improve PL/Python Performance 
 
-```sql
-CREATE FUNCTION pymax (a integer, b integer)
-  RETURNS integer
-AS $$
-  if (a is None) or (b is None):
-      return None
-  if a > b:
-     return a
-  return b
-$$ LANGUAGE plpythonu;
-```
+Importing a Python module is an expensive operation that can adversely affect performance. If you are importing the same module frequently, you can use Python global variables to import the module on the first invocation and forego loading the module on subsequent imports. 
 
-You can use the STRICT property to perform the null handling instead of using the two conditional statements.
+The following PL/Python function uses the GD persistent storage dictionary to avoid importing the module NumPy if it has already been imported in the GD. The UDF includes a call to `plpy.notice()` to display a message when importing the module.
 
-```sql
-CREATE FUNCTION pymax (a integer, b integer) 
-  RETURNS integer AS $$ 
-return max(a,b) 
-$$ LANGUAGE plpythonu STRICT ;
+``` sql
+=# CREATE FUNCTION mypy_import2gd() RETURNS text AS $$ 
+     if 'numpy' not in GD:
+       plpy.notice('mypy_import2gd: importing module numpy')
+       import numpy
+       GD['numpy'] = numpy
+     return 'numpy'
+   $$ LANGUAGE plpythonu;
 ```
-
-You can run the user-defined function pymax with SELECT command. This example runs the UDF and shows the output.
-
-```sql
-SELECT ( pymax(123, 43));
-column1
----------
-     123
+``` sql
+=# SELECT mypy_import2gd();
+NOTICE:  mypy_import2gd: importing module numpy
+CONTEXT:  PL/Python function "mypy_import2gd"
+ mypy_import2gd 
+----------------
+ numpy
 (1 row)
 ```
-
-This example that returns data from an SQL query that is run against a table. These two commands create a simple table and add data to the table.
-
-```sql
-CREATE TABLE sales (id int, year int, qtr int, day int, region text)
-  DISTRIBUTED BY (id) ;
-
-INSERT INTO sales VALUES
- (1, 2014, 1,1, 'usa'),
- (2, 2002, 2,2, 'europe'),
- (3, 2014, 3,3, 'asia'),
- (4, 2014, 4,4, 'usa'),
- (5, 2014, 1,5, 'europe'),
- (6, 2014, 2,6, 'asia'),
- (7, 2002, 3,7, 'usa') ;
-```
-
-This PL/Python UDF executes a SELECT command that returns 5 rows from the table. The Python function returns the REGION value from the row specified by the input value. In the Python function, the row numbering starts from 0. Valid input for the function is an integer between 0 and 4.
-
-```sql
-CREATE OR REPLACE FUNCTION mypytest(a integer) 
-  RETURNS text 
-AS $$ 
-  rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
-  region = rv[a]["region"]
-  return region
-$$ language plpythonu;
-```
-
-Running this SELECT statement returns the REGION column value from the third row of the result set.
-
-```sql
-SELECT mypytest(2) ;
+``` sql
+=# SELECT mypy_import2gd();
+ mypy_import2gd 
+----------------
+ numpy
+(1 row)
 ```
 
-This command deletes the UDF from the database.
-
-```sql
-DROP FUNCTION mypytest(integer) ;
-```
+The second `SELECT` call does not include the `NOTICE` message, indicating that the module was obtained from the GD.
 
 ## <a id="references"></a>References 
 
@@ -543,29 +755,31 @@ This section lists references for using PL/Python.
 
 ### <a id="technicalreferences"></a>Technical References 
 
-For information about PL/Python see the PostgreSQL documentation at [http://www.postgresql.org/docs/8.2/static/plpython.html](http://www.postgresql.org/docs/8.2/static/plpython.html).
+For information about PL/Python in HAWQ, see the [PL/Python - Python Procedural Language](http://www.postgresql.org/docs/8.2/static/plpython.html) PostgreSQL documentation.
 
-For information about Python Package Index (PyPI), see [https://pypi.python.org/pypi](https://pypi.python.org/pypi).
+For information about Python Package Index (PyPI), refer to [PyPI - the Python Package Index](https://pypi.python.org/pypi).
 
-These are some Python modules that can be downloaded:
+The following Python modules may be of interest:
 
-- SciPy library provides user-friendly and efficient numerical routines such as routines for numerical integration and optimization [http://www.scipy.org/scipylib/index.html](http://www.scipy.org/scipylib/index.html). This wget command downloads the SciPy package tar file.
+- [SciPy library](http://www.scipy.org/scipylib/index.html) provides user-friendly and efficient numerical routines including those for numerical integration and optimization. To download the SciPy package tar file:
 
- ```shell
-$ wget http://sourceforge.net/projects/scipy/files/scipy/0.10.1/ scipy-0.10.1.tar.gz/download
-```
+    ``` shell
+    hawq-node$ wget http://sourceforge.net/projects/scipy/files/scipy/0.10.1/scipy-0.10.1.tar.gz
+    ```
 
-- Natural Language Toolkit (nltk) is a platform for building Python programs to work with human language data http://www.nltk.org/. This wget command downloads the nltk package tar file.
+- [Natural Language Toolkit](http://www.nltk.org/) (`nltk`) is a platform for building Python programs to work with human language data. 
 
- ```shell
-$ wget http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
- ```
+    The Python [`distribute`](https://pypi.python.org/pypi/distribute/0.6.21) package is required for `nltk`. The `distribute` package should be installed before installing `ntlk`. To download the `distribute` package tar file:
 
- **Note:** The Python package Distribute [https://pypi.python.org/pypi/](https://pypi.python.org/pypi/) distribute is required for `nltk`. The Distribute module should be installed before the `ntlk` package. This wget command downloads the Distribute package tar file.
+    ``` shell
+    hawq-node$ wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
+    ```
 
-```shell
-$ wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
-```
+    To download the `nltk` package tar file:
+
+    ``` shell
+    hawq-node$ wget http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
+    ```
 
 ### <a id="usefulreading"></a>Useful Reading 
 


Mime
View raw message