hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [21/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027
Date Mon, 29 Aug 2016 16:46:56 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/overview/TableDistributionStorage.html.md.erb
----------------------------------------------------------------------
diff --git a/overview/TableDistributionStorage.html.md.erb b/overview/TableDistributionStorage.html.md.erb
new file mode 100755
index 0000000..8bf6542
--- /dev/null
+++ b/overview/TableDistributionStorage.html.md.erb
@@ -0,0 +1,41 @@
+---
+title: Table Distribution and Storage
+---
+
+HAWQ stores all table data, except the system table, in HDFS. When a user creates a table, the metadata is stored on the master's local file system and the table content is stored in HDFS.
+
+In order to simplify table data management, all the data of one relation are saved under one HDFS folder.
+
+For all HAWQ table storage formats, AO \(Append-Only\) and Parquet, the data files are splittable, so that HAWQ can assign multiple virtual segments to consume one data file concurrently. This increases the degree of query parallelism.
+
+## Table Distribution Policy
+
+The default table distribution policy in HAWQ is random.
+
+Randomly distributed tables have some benefits over hash distributed tables. For example, after cluster expansion, HAWQ can use more resources automatically without redistributing the data. For huge tables, redistribution is very expensive, and data locality for randomly distributed tables is better after the underlying HDFS redistributes its data during rebalance or data node failures. This is quite common when the cluster is large.
+
+On the other hand, for some queries, hash distributed tables are faster than randomly distributed tables. For example, hash distributed tables have some performance benefits for some TPC-H queries. You should choose the distribution policy that is best suited for your application's scenario.
+
+See [Choosing the Table Distribution Policy](/20/ddl/ddl-table.html) for more details.
+
+## Data Locality
+
+Data is distributed across HDFS DataNodes. Since remote read involves network I/O, a data locality algorithm improves the local read ratio. HAWQ considers three aspects when allocating data blocks to virtual segments:
+
+-   Ratio of local read
+-   Continuity of file read
+-   Data balance among virtual segments
+
+## External Data Access
+
+HAWQ can access data in external files using the HAWQ Extension Framework (PXF).
+PXF is an extensible framework that allows HAWQ to access data in external
+sources as readable or writable HAWQ tables. PXF has built-in connectors for
+accessing data inside HDFS files, Hive tables, and HBase tables. PXF also
+integrates with HCatalog to query Hive tables directly. See [Working with PXF
+and External Data](/20/pxf/HawqExtensionFrameworkPXF.html) for more
+details.
+
+Users can create custom PXF connectors to access other parallel data stores or
+processing engines. Connectors are Java plug-ins that use the PXF API. For more
+information see [PXF External Tables and API](/20/pxf/PXFExternalTableandAPIReference.html).

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/overview/system-overview.html.md.erb
----------------------------------------------------------------------
diff --git a/overview/system-overview.html.md.erb b/overview/system-overview.html.md.erb
new file mode 100644
index 0000000..9fc1c53
--- /dev/null
+++ b/overview/system-overview.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Apache HAWQ (Incubating) System Overview
+---
+* <a href="./HAWQOverview.html" class="subnav">What is HAWQ?</a>
+* <a href="./HAWQArchitecture.html" class="subnav">HAWQ Architecture</a>
+* <a href="./TableDistributionStorage.html" class="subnav">Table Distribution and Storage</a>
+* <a href="./ElasticSegments.html" class="subnav">Elastic Virtual Segment Allocation</a>
+* <a href="./ResourceManagement.html" class="subnav">Resource Management</a>
+* <a href="./HDFSCatalogCache.html" class="subnav">HDFS Catalog Cache</a>
+* <a href="./ManagementTools.html" class="subnav">Management Tools</a>
+* <a href="./RedundancyFailover.html" class="subnav">Redundancy and Fault Tolerance</a>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/UsingProceduralLanguages.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/UsingProceduralLanguages.html.md.erb b/plext/UsingProceduralLanguages.html.md.erb
new file mode 100644
index 0000000..3ffba2c
--- /dev/null
+++ b/plext/UsingProceduralLanguages.html.md.erb
@@ -0,0 +1,20 @@
+---
+title: Using Procedural Languages and Extensions in HAWQ
+---
+
+HAWQ allows user-defined functions to be written in other languages besides SQL and C. These other languages are generically called *procedural languages* (PLs).
+
+For a function written in a procedural language, the database server has no built-in knowledge about how to interpret the function's source text. Instead, the task is passed to a special handler that knows the details of the language. The handler could either do all the work of parsing, syntax analysis, execution, and so on itself, or it could serve as "glue" between HAWQ and an existing implementation of a programming language. The handler itself is a C language function compiled into a shared object and loaded on demand, just like any other C function.
+
+This chapter describes the following:
+
+-   <a href="using_pljava.html">Using PL/Java</a>
+-   <a href="using_plperl.html">Using PL/Perl</a>
+-   <a href="using_plpgsql.html">Using PL/pgSQL</a>
+-   <a href="using_plpython.html">Using PL/Python</a>
+-   <a href="using_plr.html">Using PL/R</a>
+-   <a href="using_pgcrypto.html">Using pgcrypto</a>
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_pgcrypto.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_pgcrypto.html.md.erb b/plext/using_pgcrypto.html.md.erb
new file mode 100644
index 0000000..e3e9225
--- /dev/null
+++ b/plext/using_pgcrypto.html.md.erb
@@ -0,0 +1,32 @@
+---
+title: Enabling Cryptographic Functions for PostgreSQL (pgcrypto)
+---
+
+`pgcrypto` is a package extension included in your HAWQ distribution. You must explicitly enable the cryptographic functions to use this extension.
+
+## <a id="pgcryptoprereq"></a>Prerequisites 
+
+
+Before you enable the `pgcrypto` software package, make sure that your HAWQ database is running, you have sourced `greenplum_path.sh`, and that the `$GPHOME` environment variable is set.
+
+## <a id="enablepgcrypto"></a>Enable pgcrypto 
+
+On every database in which you want to enable `pgcrypto`, run the following command:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/pgcrypto.sql
+```
+	
+Replace \<dbname\> with the name of the target database.
+	
+## <a id="uninstallpgcrypto"></a>Disable pgcrypto 
+
+The `uninstall_pgcrypto.sql` script removes `pgcrypto` objects from your database.  On each database in which you enabled `pgcrypto` support, execute the following:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/uninstall_pgcrypto.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+	
+**Note:**  This script does not remove dependent user-created objects.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_pljava.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_pljava.html.md.erb b/plext/using_pljava.html.md.erb
new file mode 100644
index 0000000..3cce857
--- /dev/null
+++ b/plext/using_pljava.html.md.erb
@@ -0,0 +1,666 @@
+---
+title: Using PL/Java
+---
+
+This section contains an overview of the HAWQ PL/Java language. 
+
+
+## <a id="aboutpljava"></a>About PL/Java 
+
+With the HAWQ PL/Java extension, you can write Java methods using your favorite Java IDE and install the JAR files that implement the methods in your HAWQ cluster.
+
+**Note**: If building HAWQ from source, you must specify PL/Java as a build option when compiling HAWQ. To use PL/Java in a HAWQ deployment, you must explicitly enable the PL/Java extension in all desired databases.  
+
+The HAWQ PL/Java package is based on the open source PL/Java 1.4.0. HAWQ PL/Java provides the following features.
+
+- Ability to execute PL/Java functions with Java 1.6 or 1.7.
+- Standardized utilities (modeled after the SQL 2003 proposal) to install and maintain Java code in the database.
+- Standardized mappings of parameters and result. Complex types as well as sets are supported.
+- An embedded, high performance, JDBC driver utilizing the internal HAWQ Database SPI routines.
+- Metadata support for the JDBC driver. Both `DatabaseMetaData` and `ResultSetMetaData` are included.
+- The ability to return a `ResultSet` from a query as an alternative to building a ResultSet row by row.
+- Full support for savepoints and exception handling.
+- The ability to use IN, INOUT, and OUT parameters.
+- Two separate HAWQ languages:
+	- pljava, TRUSTED PL/Java language
+	- pljavau, UNTRUSTED PL/Java language
+- Transaction and Savepoint listeners enabling code execution when a transaction or savepoint is committed or rolled back.
+- Integration with GNU GCJ on selected platforms.
+
+A function in SQL will appoint a static method in a Java class. In order for the function to execute, the appointed class must available on the class path specified by the HAWQ server configuration parameter `pljava_classpath`. The PL/Java extension adds a set of functions that helps to install and maintain the Java classes. Classes are stored in normal Java archives, JAR files. A JAR file can optionally contain a deployment descriptor that in turn contains SQL commands to be executed when the JAR is deployed or undeployed. The functions are modeled after the standards proposed for SQL 2003.
+
+PL/Java implements a standard way of passing parameters and return values. Complex types and sets are passed using the standard JDBC ResultSet class.
+
+A JDBC driver is included in PL/Java. This driver calls HAWQ internal SPI routines. The driver is essential since it is common for functions to make calls back to the database to fetch data. When PL/Java functions fetch data, they must use the same transactional boundaries that are used by the main function that entered PL/Java execution context.
+
+PL/Java is optimized for performance. The Java virtual machine executes within the same process as the backend to minimize call overhead. PL/Java is designed with the objective to enable the power of Java to the database itself so that database intensive business logic can execute as close to the actual data as possible.
+
+The standard Java Native Interface (JNI) is used when bridging calls between the backend and the Java VM.
+
+
+## <a id="abouthawqpljava"></a>About HAWQ PL/Java 
+
+There are a few key differences between the implementation of PL/Java in standard PostgreSQL and HAWQ.
+
+### <a id="pljavafunctions"></a>Functions 
+
+The following functions are not supported in HAWQ. The classpath is handled differently in a distributed HAWQ environment than in the PostgreSQL environment.
+
+- sqlj.install_jar
+- sqlj.install_jar
+- sqlj.replace_jar
+- sqlj.remove_jar
+- sqlj.get_classpath
+- sqlj.set_classpath
+
+HAWQ uses the `pljava_classpath` server configuration parameter in place of the `sqlj.set_classpath` function.
+
+### <a id="serverconfigparams"></a>Server Configuration Parameters 
+
+The following server configuration parameters are used by PL/Java in HAWQ. These parameters replace the `pljava.*` parameters that are used in the standard PostgreSQL PL/Java implementation.
+
+<p class="note"><b>Note:</b> See the <a href="/20/reference/hawq-reference.html">HAWQ Reference</a> for information about HAWQ server configuration parameters.</p>
+
+#### pljava\_classpath
+
+A colon (:) separated list of the jar files containing the Java classes used in any PL/Java functions. The jar files must be installed in the same locations on all HAWQ hosts. With the trusted PL/Java language handler, jar file paths must be relative to the `$GPHOME/lib/postgresql/java/` directory. With the untrusted language handler (javaU language tag), paths may be relative to `$GPHOME/lib/postgresql/java/` or absolute.
+
+#### pljava\_statement\_cache\_size
+
+Sets the size in KB of the Most Recently Used (MRU) cache for prepared statements.
+
+#### pljava\_release\_lingering\_savepoints
+
+If TRUE, lingering savepoints will be released on function exit. If FALSE, they will be rolled back.
+
+#### pljava\_vmoptions
+
+Defines the start up options for the Java VM.
+
+
+## <a id="enablepljava"></a>Enabling and Removing PL/Java Support 
+
+The PL/Java extension must be explicitly enabled on each database in which it will be used.
+
+
+### <a id="pljavaprereq"></a>Prerequisites 
+
+Before you enable PL/Java:
+
+1. Ensure that you have installed a supported Java runtime environment and that the `$JAVA_HOME` variable is set to the same path on the master and all segment nodes.
+
+2. Perform the following step on all machines to set up `ldconfig` for JDK:
+
+	``` shell
+	$ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf
+	$ ldconfig
+	```
+4. Make sure that your HAWQ cluster is running, you have sourced `greenplum_path.sh` and that your `$GPHOME` environment variable is set.
+
+
+### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+To use PL/Java:
+
+1. Enable the language for each database.
+1. Install user-created JAR files containing Java methods on all HAWQ hosts.
+1. Add the name of the JAR file to the HAWQ `pljava_classpath` server configuration parameter in `hawq-site.xml`. This parameter value should contain a list of the installed JAR files.
+
+#### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+Perform the following steps as the `gpadmin` user:
+
+1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` SQL script in the databases that use PL/Java. The `install.sql` script registers both the trusted and untrusted PL/Java. For example, the following command enables PL/Java on a database named `testdb`:
+
+	``` shell
+	$ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql
+	```
+	
+	To enable the PL/Java extension in all new HAWQ databases, run the script on the `template1` database: 
+
+    ``` shell
+    $ psql -d template1 -f $GPHOME/share/postgresql/pljava/install.sql
+    ```
+
+    Use this option *only* if you are certain you want to enable PL/Java in all new databases.
+	
+2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on all the HAWQ hosts. This example uses the `hawq scp` utility to copy the `myclasses.jar` file:
+
+	``` shell
+	$ hawq scp -f hawq_hosts myclasses.jar =:$GPHOME/lib/postgresql/java/
+	```
+	The `hawq_hosts` file contains a list of the HAWQ hosts.
+
+3. The JAR files must be added to the `pljava_classpath` configuration parameter. This parameter can be set at either the database session or global levels.  
+
+    To affect only the *current* database session, set the `pljava_classpath` configuration parameter at the `psql` prompt:
+	
+	 ``` sql
+	 psql> set pljava_classpath='myclasses.jar';
+	 ```
+
+    To affect *all* sessions, set the `pljava_classpath` server configuration parameter and restart the HAWQ cluster:
+
+	 ``` shell
+	 $ hawq config -c pljava_classpath -v \'examples.jar:myclasses.jar\' 
+	 $ hawq restart cluster
+	 ```
+
+5. (Optional) Your HAWQ installation includes an `examples.sql` file.  This script contains sample PL/Java functions that you can use for testing. Run the commands in this file to create and run test functions that use the Java classes in `examples.jar`:
+
+	``` shell
+	$ psql -f $GPHOME/share/postgresql/pljava/examples.sql
+	```
+
+#### Configuring PL/Java VM Options
+
+PL/Java JVM options can be configured via the `pljava_vmoptions` parameter in `hawq-site.xml`. For example, `pljava_vmoptions=-Xmx512M` sets the maximum heap size of the JVM. The default Xmx value is set to `-Xmx64M`.
+
+	
+### <a id="uninstallpljava"></a>Disable PL/Java 
+
+To disable PL/Java, you should:
+
+1. Remove PL/Java support from each database in which it was added.
+2. Uninstall the Java JAR files.
+
+#### <a id="uninstallpljavasupport"></a>Remove PL/Java Support from Databases 
+
+For a database that no long requires the PL/Java language, remove support for PL/Java by running the `uninstall.sql` file as the `gpadmin` user. For example, the following command disables the PL/Java language in the specified database:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/pljava/uninstall.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+
+
+#### <a id="uninstallpljavapackage"></a>Uninstall the Java JAR files 
+
+When no databases have PL/Java as a registered language, remove the Java JAR files:
+
+1. Remove the `pljava_classpath` server configuration parameter in the `hawq-site.xml` file.
+
+1. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of each HAWQ host.
+
+1. Restart the HAWQ cluster:
+
+	``` shell
+	$ hawq restart cluster
+	```
+
+
+## <a id="writingpljavafunc"></a>Writing PL/Java Functions 
+
+This section provides information about writing functions with PL/Java.
+
+- [SQL Declaration](#sqldeclaration)
+- [Type Mapping](#typemapping)
+- [NULL Handling](#nullhandling)
+- [Complex Types](#complextypes)
+- [Returning Complex Types](#returningcomplextypes)
+- [Functions That Return Sets](#functionreturnsets)
+- [Returning a SETOF \<scalar type\>](#returnsetofscalar)
+- [Returning a SETOF \<complex type\>](#returnsetofcomplex)
+
+
+### <a id="sqldeclaration"></a>SQL Declaration 
+
+A Java function is declared with the name of a class and a static method on that class. The class will be resolved using the classpath that has been defined for the schema where the function is declared. If no classpath has been defined for that schema, the public schema is used. If no classpath is found there either, the class is resolved using the system classloader.
+
+The following function can be declared to access the static method getProperty on `java.lang.System` class:
+
+```sql
+CREATE FUNCTION getsysprop(VARCHAR)
+  RETURNS VARCHAR
+  AS 'java.lang.System.getProperty'
+  LANGUAGE java;
+```
+
+Run the following command to return the Java `user.home` property:
+
+```sql
+SELECT getsysprop('user.home');
+```
+
+### <a id="typemapping"></a>Type Mapping 
+
+Scalar types are mapped in a straightforward way. This table lists the current mappings.
+
+***Table 1: PL/Java data type mappings***
+
+| PostgreSQL | Java |
+|------------|------|
+| bool | boolean |
+| char | byte |
+| int2 | short |
+| int4 | int |
+| int8 | long |
+| varchar | java.lang.String |
+| text | java.lang.String |
+| bytea | byte[ ] |
+| date | java.sql.Date |
+| time | java.sql.Time (stored value treated as local time) |
+| timetz | java.sql.Time |
+| timestamp	| java.sql.Timestamp (stored value treated as local time) |
+| timestampz |	java.sql.Timestamp |
+| complex |	java.sql.ResultSet |
+| setof complex	| java.sql.ResultSet |
+
+All other types are mapped to `java.lang.String` and will utilize the standard textin/textout routines registered for respective type.
+
+### <a id="nullhandling"></a>NULL Handling 
+
+The scalar types that map to Java primitives can not be passed as NULL values. To pass NULL values, those types can have an alternative mapping. You enable this mapping by explicitly denoting it in the method reference.
+
+```sql
+CREATE FUNCTION trueIfEvenOrNull(integer)
+  RETURNS bool
+  AS 'foo.fee.Fum.trueIfEvenOrNull(java.lang.Integer)'
+  LANGUAGE java;
+```
+
+The Java code would be similar to this:
+
+```java
+package foo.fee;
+public class Fum
+{
+  static boolean trueIfEvenOrNull(Integer value)
+  {
+    return (value == null)
+      ? true
+      : (value.intValue() % 1) == 0;
+  }
+}
+```
+
+The following two statements both yield true:
+
+```sql
+SELECT trueIfEvenOrNull(NULL);
+SELECT trueIfEvenOrNull(4);
+```
+
+In order to return NULL values from a Java method, you use the object type that corresponds to the primitive (for example, you return `java.lang.Integer` instead of `int`). The PL/Java resolve mechanism finds the method regardless. Since Java cannot have different return types for methods with the same name, this does not introduce any ambiguity.
+
+### <a id="complextypes"></a>Complex Types 
+
+A complex type will always be passed as a read-only `java.sql.ResultSet` with exactly one row. The `ResultSet` is positioned on its row so a call to `next()` should not be made. The values of the complex type are retrieved using the standard getter methods of the `ResultSet`.
+
+Example:
+
+```sql
+CREATE TYPE complexTest
+  AS(base integer, incbase integer, ctime timestamptz);
+CREATE FUNCTION useComplexTest(complexTest)
+  RETURNS VARCHAR
+  AS 'foo.fee.Fum.useComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+In the Java class `Fum`, we add the following static method:
+
+```java
+public static String useComplexTest(ResultSet complexTest)
+throws SQLException
+{
+  int base = complexTest.getInt(1);
+  int incbase = complexTest.getInt(2);
+  Timestamp ctime = complexTest.getTimestamp(3);
+  return "Base = \"" + base +
+    "\", incbase = \"" + incbase +
+    "\", ctime = \"" + ctime + "\"";
+}
+```
+
+### <a id="returningcomplextypes"></a>Returning Complex Types 
+
+Java does not stipulate any way to create a `ResultSet`. Hence, returning a ResultSet is not an option. The SQL-2003 draft suggests that a complex return value should be handled as an IN/OUT parameter. PL/Java implements a `ResultSet` that way. If you declare a function that returns a complex type, you will need to use a Java method with boolean return type with a last parameter of type `java.sql.ResultSet`. The parameter will be initialized to an empty updateable ResultSet that contains exactly one row.
+
+Assume that the complexTest type in previous section has been created.
+
+```sql
+CREATE FUNCTION createComplexTest(int, int)
+  RETURNS complexTest
+  AS 'foo.fee.Fum.createComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+The PL/Java method resolve will now find the following method in the `Fum` class:
+
+```java
+public static boolean complexReturn(int base, int increment, 
+  ResultSet receiver)
+throws SQLException
+{
+  receiver.updateInt(1, base);
+  receiver.updateInt(2, base + increment);
+  receiver.updateTimestamp(3, new 
+    Timestamp(System.currentTimeMillis()));
+  return true;
+}
+```
+
+The return value denotes if the receiver should be considered as a valid tuple (true) or NULL (false).
+
+### <a id="functionreturnsets"></a>Functions that Return Sets 
+
+When returning result set, you should not build a result set before returning it, because building a large result set would consume a large amount of resources. It is better to produce one row at a time. Incidentally, that is what the HAWQ backend expects a function with SETOF return to do. You can return a SETOF a scalar type such as an int, float or varchar, or you can return a SETOF a complex type.
+
+### <a id="returnsetofscalar"></a>Returning a SETOF \<scalar type\> 
+
+In order to return a set of a scalar type, you need create a Java method that returns something that implements the `java.util.Iterator` interface. Here is an example of a method that returns a SETOF varchar:
+
+```sql
+CREATE FUNCTION javatest.getSystemProperties()
+  RETURNS SETOF varchar
+  AS 'foo.fee.Bar.getNames'
+  IMMUTABLE LANGUAGE java;
+```
+
+This simple Java method returns an iterator:
+
+```java
+package foo.fee;
+import java.util.Iterator;
+
+public class Bar
+{
+    public static Iterator getNames()
+    {
+        ArrayList names = new ArrayList();
+        names.add("Lisa");
+        names.add("Bob");
+        names.add("Bill");
+        names.add("Sally");
+        return names.iterator();
+    }
+}
+```
+
+### <a id="returnsetofcomplex"></a>Returning a SETOF \<complex type\> 
+
+A method returning a SETOF <complex type> must use either the interface `org.postgresql.pljava.ResultSetProvider` or `org.postgresql.pljava.ResultSetHandle`. The reason for having two interfaces is that they cater for optimal handling of two distinct use cases. The former is for cases when you want to dynamically create each row that is to be returned from the SETOF function. The latter makes is in cases where you want to return the result of an executed query.
+
+#### Using the ResultSetProvider Interface
+
+This interface has two methods. The boolean `assignRowValues(java.sql.ResultSet tupleBuilder, int rowNumber)` and the `void close()` method. The HAWQ query evaluator will call the `assignRowValues` repeatedly until it returns false or until the evaluator decides that it does not need any more rows. Then it calls close.
+
+You can use this interface the following way:
+
+```sql
+CREATE FUNCTION javatest.listComplexTests(int, int)
+  RETURNS SETOF complexTest
+  AS 'foo.fee.Fum.listComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+The function maps to a static java method that returns an instance that implements the `ResultSetProvider` interface.
+
+```java
+public class Fum implements ResultSetProvider
+{
+  private final int m_base;
+  private final int m_increment;
+  public Fum(int base, int increment)
+  {
+    m_base = base;
+    m_increment = increment;
+  }
+  public boolean assignRowValues(ResultSet receiver, int 
+currentRow)
+  throws SQLException
+  {
+    // Stop when we reach 12 rows.
+    //
+    if(currentRow >= 12)
+      return false;
+    receiver.updateInt(1, m_base);
+    receiver.updateInt(2, m_base + m_increment * currentRow);
+    receiver.updateTimestamp(3, new 
+Timestamp(System.currentTimeMillis()));
+    return true;
+  }
+  public void close()
+  {
+   // Nothing needed in this example
+  }
+  public static ResultSetProvider listComplexTests(int base, 
+int increment)
+  throws SQLException
+  {
+    return new Fum(base, increment);
+  }
+}
+```
+
+The `listComplextTests` method is called once. It may return NULL if no results are available or an instance of the `ResultSetProvider`. Here the Java class `Fum` implements this interface so it returns an instance of itself. The method `assignRowValues` will then be called repeatedly until it returns false. At that time, close will be called.
+
+#### Using the ResultSetHandle Interface
+
+This interface is similar to the `ResultSetProvider` interface in that it has a `close()` method that will be called at the end. But instead of having the evaluator call a method that builds one row at a time, this method has a method that returns a `ResultSet`. The query evaluator will iterate over this set and deliver the `ResultSet` contents, one tuple at a time, to the caller until a call to `next()` returns false or the evaluator decides that no more rows are needed.
+
+Here is an example that executes a query using a statement that it obtained using the default connection. The SQL suitable for the deployment descriptor looks like this:
+
+```sql
+CREATE FUNCTION javatest.listSupers()
+  RETURNS SETOF pg_user
+  AS 'org.postgresql.pljava.example.Users.listSupers'
+  LANGUAGE java;
+CREATE FUNCTION javatest.listNonSupers()
+  RETURNS SETOF pg_user
+  AS 'org.postgresql.pljava.example.Users.listNonSupers'
+  LANGUAGE java;
+```
+
+And in the Java package `org.postgresql.pljava.example` a class `Users` is added:
+
+```java
+public class Users implements ResultSetHandle
+{
+  private final String m_filter;
+  private Statement m_statement;
+  public Users(String filter)
+  {
+    m_filter = filter;
+  }
+  public ResultSet getResultSet()
+  throws SQLException
+  {
+    m_statement = 
+      DriverManager.getConnection("jdbc:default:connection").cr
+eateStatement();
+    return m_statement.executeQuery("SELECT * FROM pg_user 
+       WHERE " + m_filter);
+  }
+
+  public void close()
+  throws SQLException
+  {
+    m_statement.close();
+  }
+
+  public static ResultSetHandle listSupers()
+  {
+    return new Users("usesuper = true");
+  }
+
+  public static ResultSetHandle listNonSupers()
+  {
+    return new Users("usesuper = false");
+  }
+}
+```
+## <a id="usingjdbc"></a>Using JDBC 
+
+PL/Java contains a JDBC driver that maps to the PostgreSQL SPI functions. A connection that maps to the current transaction can be obtained using the following statement:
+
+```java
+Connection conn = 
+  DriverManager.getConnection("jdbc:default:connection"); 
+```
+
+After obtaining a connection, you can prepare and execute statements similar to other JDBC connections. These are limitations for the PL/Java JDBC driver:
+
+- The transaction cannot be managed in any way. Thus, you cannot use methods on the connection such as:
+   - `commit()`
+   - `rollback()`
+   - `setAutoCommit()`
+   - `setTransactionIsolation()`
+- Savepoints are available with some restrictions. A savepoint cannot outlive the function in which it was set and it must be rolled back or released by that same function.
+- A ResultSet returned from `executeQuery()` are always `FETCH_FORWARD` and `CONCUR_READ_ONLY`.
+- Meta-data is only available in PL/Java 1.1 or higher.
+- `CallableStatement` (for stored procedures) is not implemented.
+- The types `Clob` or `Blob` are not completely implemented, they need more work. The types `byte[]` and `String` can be used for `bytea` and `text` respectively.
+
+## <a id="exceptionhandling"></a>Exception Handling 
+
+You can catch and handle an exception in the HAWQ backend just like any other exception. The backend `ErrorData` structure is exposed as a property in a class called `org.postgresql.pljava.ServerException` (derived from `java.sql.SQLException`) and the Java try/catch mechanism is synchronized with the backend mechanism.
+
+**Important:** You will not be able to continue executing backend functions until your function has returned and the error has been propagated when the backend has generated an exception unless you have used a savepoint. When a savepoint is rolled back, the exceptional condition is reset and you can continue your execution.
+
+## <a id="savepoints"></a>Savepoints 
+
+HAWQ savepoints are exposed using the `java.sql.Connection` interface. Two restrictions apply.
+
+- A savepoint must be rolled back or released in the function where it was set.
+- A savepoint must not outlive the function where it was set.
+
+## <a id="logging"></a>Logging 
+
+PL/Java uses the standard Java Logger. Hence, you can write things like:
+
+```java
+Logger.getAnonymousLogger().info( "Time is " + new 
+Date(System.currentTimeMillis()));
+```
+
+At present, the logger uses a handler that maps the current state of the HAWQ configuration setting `log_min_messages` to a valid Logger level and that outputs all messages using the HAWQ backend function `elog()`.
+
+**Note:** The `log_min_messages` setting is read from the database the first time a PL/Java function in a session is executed. On the Java side, the setting does not change after the first PL/Java function execution in a specific session until the HAWQ session that is working with PL/Java is restarted.
+
+The following mapping apply between the Logger levels and the HAWQ backend levels.
+
+***Table 2: PL/Java Logging Levels Mappings***
+
+| java.util.logging.Level | HAWQ Level |
+|-------------------------|------------|
+| SEVERE ERROR | ERROR |
+| WARNING |	WARNING |
+| CONFIG |	LOG |
+| INFO | INFO |
+| FINE | DEBUG1 |
+| FINER | DEBUG2 |
+| FINEST | DEBUG3 |
+
+## <a id="security"></a>Security 
+
+This section describes security aspects of using PL/Java.
+
+### <a id="installation"></a>Installation 
+
+Only a database super user can install PL/Java. The PL/Java utility functions are installed using SECURITY DEFINER so that they execute with the access permissions that where granted to the creator of the functions.
+
+### <a id="trustedlang"></a>Trusted Language 
+
+PL/Java is a trusted language. The trusted PL/Java language has no access to the file system as stipulated by PostgreSQL definition of a trusted language. Any database user can create and access functions in a trusted language.
+
+PL/Java also installs a language handler for the language `javau`. This version is not trusted and only a superuser can create new functions that use it. Any user can call the functions.
+
+
+## <a id="pljavaexample"></a>Example 
+
+The following simple Java example creates a JAR file that contains a single method and runs the method.
+
+<p class="note"><b>Note:</b> The example requires Java SDK to compile the Java file.</p>
+
+The following method returns a substring.
+
+```java
+{
+public static String substring(String text, int beginIndex,
+  int endIndex)
+    {
+    return text.substring(beginIndex, endIndex);
+    }
+}
+```
+
+Enter the Java code in a text file `example.class`.
+
+Contents of the file `manifest.txt`:
+
+```plaintext
+Manifest-Version: 1.0
+Main-Class: Example
+Specification-Title: "Example"
+Specification-Version: "1.0"
+Created-By: 1.6.0_35-b10-428-11M3811
+Build-Date: 01/20/2013 10:09 AM
+```
+
+Compile the Java code:
+
+```shell
+$ javac *.java
+```
+
+Create a JAR archive named `analytics.jar` that contains the class file and the manifest file in the JAR.
+
+```shell
+$ jar cfm analytics.jar manifest.txt *.class
+```
+
+Upload the JAR file to the HAWQ master host.
+
+Run the `hawq scp` utility to copy the jar file to the HAWQ Java directory. Use the `-f` option to specify the file that contains a list of the master and segment hosts.
+
+```shell
+$ hawq scp -f hawq_hosts analytics.jar =:/usr/local/hawq/lib/postgresql/java/
+```
+
+Use the `hawq config` utility to set the HAWQ `pljava_classpath` server configuration parameter. The parameter lists the installed JAR files.
+
+```shell
+$ hawq config -c pljava_classpath -v \'analytics.jar\'
+```
+
+Run the `hawq restart` utility to reload the configuration files.
+
+```shell
+$ hawq restart cluster
+```
+
+From the `psql` command line, run the following command to show the installed JAR files.
+
+```shell
+psql# show pljava_classpath
+```
+
+The following SQL commands create a table and define a Java function to test the method in the JAR file:
+
+```sql
+CREATE TABLE temp (a varchar) DISTRIBUTED randomly; 
+INSERT INTO temp values ('my string'); 
+--Example function 
+CREATE OR REPLACE FUNCTION java_substring(varchar, int, int) 
+RETURNS varchar AS 'Example.substring' LANGUAGE java; 
+--Example execution 
+SELECT java_substring(a, 1, 5) FROM temp;
+```
+
+You can place the contents in a file, `mysample.sql` and run the command from a psql command line:
+
+```shell
+psql# \i mysample.sql 
+```
+
+The output is similar to this:
+
+```shell
+java_substring
+----------------
+ y st
+(1 row)
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plperl.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plperl.html.md.erb b/plext/using_plperl.html.md.erb
new file mode 100644
index 0000000..d6ffa04
--- /dev/null
+++ b/plext/using_plperl.html.md.erb
@@ -0,0 +1,27 @@
+---
+title: Using PL/Perl
+---
+
+This section contains an overview of the HAWQ PL/Perl language extension.
+
+## <a id="enableplperl"></a>Enabling PL/Perl
+
+If PL/Perl is enabled during HAWQ build time, HAWQ installs the PL/Perl language extension automatically. To use PL/Perl, you must enable it on specific databases.
+
+On every database where you want to enable PL/Perl, connect to the database using the psql client.
+
+``` shell
+$ psql -d <dbname>
+```
+
+Replace \<dbname\> with the name of the target database.
+
+Then, run the following SQL command:
+
+``` shell
+psql# CREATE LANGUAGE plperl;
+```
+
+## <a id="references"></a>References 
+
+For more information on using PL/Perl, see the PostgreSQL PL/Perl documentation at [https://www.postgresql.org/docs/8.2/static/plperl.html](https://www.postgresql.org/docs/8.2/static/plperl.html).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plpgsql.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpgsql.html.md.erb b/plext/using_plpgsql.html.md.erb
new file mode 100644
index 0000000..3661e9b
--- /dev/null
+++ b/plext/using_plpgsql.html.md.erb
@@ -0,0 +1,142 @@
+---
+title: Using PL/pgSQL in HAWQ
+---
+
+SQL is the language of most other relational databases use as query language. It is portable and easy to learn. But every SQL statement must be executed individually by the database server. 
+
+PL/pgSQL is a loadable procedural language. PL/SQL can do the following:
+
+-   create functions
+-   add control structures to the SQL language
+-   perform complex computations
+-   inherit all user-defined types, functions, and operators
+-   be trusted by the server
+
+You can use functions created with PL/pgSQL with any database that supports built-in functions. For example, it is possible to create complex conditional computation functions and later use them to define operators or use them in index expressions.
+
+Every SQL statement must be executed individually by the database server. Your client application must send each query to the database server, wait for it to be processed, receive and process the results, do some computation, then send further queries to the server. This requires interprocess communication and incurs network overhead if your client is on a different machine than the database server.
+
+With PL/pgSQL, you can group a block of computation and a series of queries inside the database server, thus having the power of a procedural language and the ease of use of SQL, but with considerable savings of client/server communication overhead.
+
+-   Extra round trips between client and server are eliminated
+-   Intermediate results that the client does not need do not have to be marshaled or transferred between server and client
+-   Multiple rounds of query parsing can be avoided
+
+This can result in a considerable performance increase as compared to an application that does not use stored functions.
+
+PL/pgSQL supports all the data types, operators, and functions of SQL.
+
+**Note:**  PL/pgSQL is automatically installed and registered in all HAWQ databases.
+
+## <a id="supportedargumentandresultdatatypes"></a>Supported Data Types for Arguments and Results 
+
+Functions written in PL/pgSQL accept as arguments any scalar or array data type supported by the server, and they can return a result containing this data type. They can also accept or return any composite type (row type) specified by name. It is also possible to declare a PL/pgSQL function as returning record, which means that the result is a row type whose columns are determined by specification in the calling query. See <a href="#tablefunctions" class="xref">Table Functions</a>.
+
+PL/pgSQL functions can be declared to accept a variable number of arguments by using the VARIADIC marker. This works exactly the same way as for SQL functions. See <a href="#sqlfunctionswithvariablenumbersofarguments" class="xref">SQL Functions with Variable Numbers of Arguments</a>.
+
+PL/pgSQLfunctions can also be declared to accept and return the polymorphic typesanyelement,anyarray,anynonarray, and anyenum. The actual data types handled by a polymorphic function can vary from call to call, as discussed in <a href="http://www.postgresql.org/docs/8.4/static/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC" class="xref">Section 34.2.5</a>. An example is shown in <a href="http://www.postgresql.org/docs/8.4/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-ALIASES" class="xref">Section 38.3.1</a>.
+
+PL/pgSQL functions can also be declared to return a "set" (or table) of any data type that can be returned as a single instance. Such a function generates its output by executing RETURN NEXT for each desired element of the result set, or by using RETURN QUERY to output the result of evaluating a query.
+
+Finally, a PL/pgSQL function can be declared to return void if it has no useful return value.
+
+PL/pgSQL functions can also be declared with output parameters in place of an explicit specification of the return type. This does not add any fundamental capability to the language, but it is often convenient, especially for returning multiple values. The RETURNS TABLE notation can also be used in place of RETURNS SETOF .
+
+This topic describes the following PL/pgSQLconcepts:
+
+-   [Table Functions](#tablefunctions)
+-   [SQL Functions with Variable number of Arguments](#sqlfunctionswithvariablenumbersofarguments)
+-   [Polymorphic Types](#polymorphictypes)
+
+
+## <a id="tablefunctions"></a>Table Functions 
+
+
+Table functions are functions that produce a set of rows, made up of either base data types (scalar types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE clauses in the same manner as a table, view, or subquery column.
+
+If a table function returns a base data type, the single result column name matches the function name. If the function returns a composite type, the result columns get the same names as the individual attributes of the type.
+
+A table function can be aliased in the FROM clause, but it also can be left unaliased. If a function is used in the FROM clause with no alias, the function name is used as the resulting table name.
+
+Some examples:
+
+```sql
+CREATE TABLE foo (fooid int, foosubid int, fooname text);
+
+CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
+    SELECT * FROM foo WHERE fooid = $1;
+$$ LANGUAGE SQL;
+
+SELECT * FROM getfoo(1) AS t1;
+
+SELECT * FROM foo
+    WHERE foosubid IN (
+                        SELECT foosubid
+                        FROM getfoo(foo.fooid) z
+                        WHERE z.fooid = foo.fooid
+                      );
+
+CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
+
+SELECT * FROM vw_getfoo;
+```
+
+In some cases, it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning the pseudotype record. When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. Consider this example:
+
+```sql
+SELECT *
+    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
+      AS t1(proname name, prosrc text)
+    WHERE proname LIKE 'bytea%';
+```
+
+The `dblink` function executes a remote query (see `contrib/dblink`). It is declared to return `record` since it might be used for any kind of query. The actual column set must be specified in the calling query so that the parser knows, for example, what `*` should expand to.
+
+
+## <a id="sqlfunctionswithvariablenumbersofarguments"></a>SQL Functions with Variable Numbers of Arguments 
+
+SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type. The optional arguments will be passed to the function as an array. The function is declared by marking the last parameter as VARIADIC; this parameter must be declared as being of an array type. For example:
+
+```sql
+CREATE FUNCTION mleast(VARIADIC numeric[]) RETURNS numeric AS $$
+    SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
+$$ LANGUAGE SQL;
+
+SELECT mleast(10, -1, 5, 4.4);
+ mleast 
+--------
+     -1
+(1 row)
+```
+
+Effectively, all the actual arguments at or beyond the VARIADIC position are gathered up into a one-dimensional array, as if you had written
+
+```sql
+SELECT mleast(ARRAY[10, -1, 5, 4.4]);    -- doesn't work
+```
+
+You can't actually write that, though; or at least, it will not match this function definition. A parameter marked VARIADIC matches one or more occurrences of its element type, not of its own type.
+
+Sometimes it is useful to be able to pass an already-constructed array to a variadic function; this is particularly handy when one variadic function wants to pass on its array parameter to another one. You can do that by specifying VARIADIC in the call:
+
+```sql
+SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
+```
+
+This prevents expansion of the function's variadic parameter into its element type, thereby allowing the array argument value to match normally. VARIADIC can only be attached to the last actual argument of a function call.
+
+
+
+## <a id="polymorphictypes"></a>Polymorphic Types 
+
+Four pseudo-types of special interest are anyelement,anyarray, anynonarray, and anyenum, which are collectively called *polymorphic types*. Any function declared using these types is said to be a*polymorphic function*. A polymorphic function can operate on many different data types, with the specific data type(s) being determined by the data types actually passed to it in a particular call.
+
+Polymorphic arguments and results are tied to each other and are resolved to a specific data type when a query calling a polymorphic function is parsed. Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the sam eactual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type. If there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions.anynonarray is treated exactly the same as anyelement, but adds the additional constraint that the actual type must not be an array type. anyenum is treated exactly the same as anyelement, but adds the additional constraint that the actual type must be an enum type.
+
+Thus, when more than one argument position is declared with a polymorphic type, the net effect is that only certain combinations of actual argument types are allowed. For example, a function declared as equal(anyelement, anyelement) will take any two input values, so long as they are of the same data type.
+
+When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also polymorphic, and the actual data type supplied as the argument determines the actual result type for that call. For example, if there were not already an array subscripting mechanism, one could define a function that implements subscripting `assubscript(anyarray, integer)` returns `anyelement`. This declaration constrains the actual first argument to be an array type, and allows the parser to infer the correct result type from the actual first argument's type. Another example is that a function declared `asf(anyarray)` returns `anyenum` will only accept arrays of `enum` types.
+
+Note that `anynonarray` and `anyenum` do not represent separate type variables; they are the same type as `anyelement`, just with an additional constraint. For example, declaring a function as `f(anyelement,           anyenum)` is equivalent to declaring it as `f(anyenum, anyenum)`; both actual arguments have to be the same enum type.
+
+Variadic functions described in <a href="#sqlfunctionswithvariablenumbersofarguments" class="xref">SQL Functions with Variable Numbers of Arguments</a> can be polymorphic: this is accomplished by declaring its last parameter as `VARIADIC anyarray`. For purposes of argument matching and determining the actual result type, such a function behaves the same as if you had written the appropriate number of `anynonarray` parameters.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plpython.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpython.html.md.erb b/plext/using_plpython.html.md.erb
new file mode 100644
index 0000000..5a9123c
--- /dev/null
+++ b/plext/using_plpython.html.md.erb
@@ -0,0 +1,595 @@
+---
+title: Using PL/Python in HAWQ
+---
+
+This section contains an overview of the HAWQ PL/Python language extension.
+
+## <a id="abouthawqplpython"></a>About HAWQ PL/Python 
+
+PL/Python is a loadable procedural language. With the HAWQ PL/Python extension, you can write HAWQ user-defined functions in Python that take advantage of Python features and modules to quickly build robust database applications.
+
+If PL/Python is enabled during HAWQ build time, HAWQ includes both a version of Python and PL/Python when deployed. HAWQ uses the following Python installation:
+
+```shell
+$GPHOME/ext/python/
+```
+
+### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations 
+
+- HAWQ does not support PL/Python triggers.
+- PL/Python is available only as a HAWQ untrusted language.
+ 
+## <a id="enableplpython"></a>Enabling and Removing PL/Python Support 
+
+If enabled as an option during HAWQ compilation, the PL/Python language is installed with HAWQ.
+
+**Note**: To use PL/Python in HAWQ, you must either use a pre-compiled version of HAWQ that includes PL/Python or specify PL/Python as a build option when compiling HAWQ.
+
+To create and run a PL/Python user-defined function (UDF) in a database, you must register the PL/Python language with the database. On every database where you want to install and enable PL/Python, connect to the database using the psql client.
+
+```shell
+$ psql -d <dbname>
+```
+
+Replace \<dbname\> with the name of the target database.
+
+Then, run the following SQL command:
+
+```shell
+psql# CREATE LANGUAGE plpythonu;
+```
+
+Note that `plpythonu` is installed as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it.
+
+To remove support for plpythonu from a database, run the following SQL command:
+
+```shell
+psql# DROP LANGUAGE plpythonu;
+```
+
+## <a id="developfunctions"></a>Developing Functions with PL/Python 
+
+The body of a PL/Python user-defined function is a Python script. When the function is called, its arguments are passed as elements of the array `args[]`. Named arguments are also passed as ordinary variables to the Python script. The result is returned from the PL/Python function with return statement, or yield statement in case of a result-set statement.
+
+The HAWQ PL/Python language module imports the Python module `plpy`. The module plpy implements these functions:
+
+- Functions to execute SQL queries and prepare execution plans for queries.
+   - `plpy.execute`
+   - `plpy.prepare`
+   
+- Functions to manage errors and messages.
+   - `plpy.debug`
+   - `plpy.log`
+   - `plpy.info`
+   - `plpy.notice`
+   - `plpy.warning`
+   - `plpy.error`
+   - `plpy.fatal`
+   - `plpy.debug`
+   
+## <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
+
+The PL/Python `plpy` module provides two Python functions to execute an SQL query and prepare an execution plan for a query, `plpy.execute` and `plpy.prepare`. Preparing the execution plan for a query is useful if you run the query from multiple Python functions.
+
+### <a id="plpyexecute"></a>plpy.execute 
+
+Calling `plpy.execute` with a query string and an optional limit argument causes the query to be run and the result to be returned in a Python result object. The result object emulates a list or dictionary object. The rows returned in the result object can be accessed by row number and column name. The result set row numbering starts with 0 (zero). The result object can be modified. The result object has these additional methods:
+
+- `nrows` that returns the number of rows returned by the query.
+- `status` which is the `SPI_execute()` return value.
+
+For example, this Python statement in a PL/Python user-defined function executes a query.
+
+```python
+rv = plpy.execute("SELECT * FROM my_table", 5)
+```
+
+The `plpy.execute` function returns up to 5 rows from `my_table`. The result set is stored in the `rv` object. If `my_table` has a column `my_column`, it would be accessed as:
+
+```python
+my_col_data = rv[i]["my_column"]
+```
+
+Since the function returns a maximum of 5 rows, the index `i` can be an integer between 0 and 4.
+
+### <a id="plpyprepare"></a>plpy.prepare 
+
+The function `plpy.prepare` prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query. For example, this statement can be in a PL/Python user-defined function:
+
+```python
+plan = plpy.prepare("SELECT last_name FROM my_users WHERE 
+  first_name = $1", [ "text" ])
+```
+
+The string text is the data type of the variable that is passed for the variable `$1`. After preparing a statement, you use the function `plpy.execute` to run it:
+
+```python
+rv = plpy.execute(plan, [ "Fred" ], 5)
+```
+
+The third argument is the limit for the number of rows returned and is optional.
+
+When you prepare an execution plan using the PL/Python module the plan is automatically saved. See the Postgres Server Programming Interface (SPI) documentation for information about the execution plans [http://www.postgresql.org/docs/8.2/static/spi.html](http://www.postgresql.org/docs/8.2/static/spi.html).
+
+To make effective use of saved plans across function calls you use one of the Python persistent storage dictionaries SD or GD.
+
+The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use GD with care.
+
+Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from myfunc are not available to `myfunc2`. The exception is the data in the GD dictionary, as mentioned previously.
+
+This example uses the SD dictionary:
+
+```sql
+CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
+  if SD.has_key("plan"):
+    plan = SD["plan"]
+  else:
+    plan = plpy.prepare("SELECT 1")
+    SD["plan"] = plan
+
+  # rest of function
+
+$$ LANGUAGE plpythonu;
+```
+
+## <a id="pythonerrors"></a>Handling Python Errors and Messages 
+
+The message functions `plpy.error` and `plpy.fatal` raise a Python exception which, if uncaught, propagates out to the calling query, causing the current transaction or subtransaction to be aborted. The functions raise `plpy.ERROR(msg)` and raise `plpy.FATAL(msg)` are equivalent to calling `plpy.error` and `plpy.fatal`, respectively. The other message functions only generate messages of different priority levels.
+
+Whether messages of a particular priority are reported to the client, written to the server log, or both is controlled by the HAWQ server configuration parameters `log_min_messages` and `client_min_messages`. For information about the parameters, see the [Server Configuration Parameter Reference](../reference/HAWQSiteConfig.html).
+
+## <a id="dictionarygd"></a>Using the Dictionary GD to Improve PL/Python Performance 
+
+In terms of performance, importing a Python module is an expensive operation and can affect performance. If you are importing the same module frequently, you can use Python global variables to load the module on the first invocation and not require importing the module on subsequent calls. The following PL/Python function uses the GD persistent storage dictionary to avoid importing a module if it has already been imported and is in the GD.
+
+```sql
+psql=#
+   CREATE FUNCTION pytest() returns text as $$ 
+      if 'mymodule' not in GD:
+        import mymodule
+        GD['mymodule'] = mymodule
+    return GD['mymodule'].sumd([1,2,3])
+$$;
+```
+
+## <a id="installpythonmodules"></a>Installing Python Modules 
+
+When you install a Python module on HAWQ, the HAWQ Python environment must have the module added to it across all segment hosts in the cluster. When expanding HAWQ, you must add the Python modules to the new segment hosts. You can use the HAWQ utilities `hawq ssh` and `hawq scp` run commands on HAWQ hosts and copy files to the hosts. For information about the utilities, see the [HAWQ Management Tools Reference](../reference/cli/management_tools.html).
+
+As part of the HAWQ installation, the `gpadmin` user environment is configured to use Python that is installed with HAWQ.
+
+To check which Python is being used in your environment, use the `which` command:
+
+```bash
+$ which python
+```
+
+The command returns the location of the Python installation. The Python installed with HAWQ is in the HAWQ `ext/python` directory.
+
+```bash
+/$GPHOME/ext/python/bin/python
+```
+
+If you are building a Python module, you must ensure that the build creates the correct executable. For example on a Linux system, the build should create a 64-bit executable.
+
+Before building a Python module prior to installation, ensure that the appropriate software to build the module is installed and properly configured. The build environment is required only on the host where you build the module.
+
+These are examples of installing and testing Python modules:
+
+- Simple Python Module Installation Example (setuptools)
+- Complex Python Installation Example (NumPy)
+- Testing Installed Python Modules
+
+### <a id="simpleinstall"></a>Simple Python Module Installation Example (setuptools) 
+
+This example manually installs the Python `setuptools` module from the Python Package Index repository. The module lets you easily download, build, install, upgrade, and uninstall Python packages.
+
+This example first builds the module from a package and installs the module on a single host. Then the module is built and installed on segment hosts.
+
+Get the module package from the Python Package Index site. For example, run this `wget` command on a HAWQ host as the gpadmin user to get the tar.gz file.
+
+```bash
+$ wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
+```
+
+Extract the files from the tar.gz file.
+
+```bash
+$ tar -xzvf setuptools-18.4.tar.gz
+```
+
+Go to the directory that contains the package files, and run the Python scripts to build and install the Python package.
+
+```bash
+$ cd setuptools-18.4
+$ python setup.py build && python setup.py install
+```
+
+The following Python command returns no errors if the module is available to Python.
+
+```bash
+$ python -c "import setuptools"
+```
+
+Copy the package to the HAWQ hosts with the `hawq scp` utility. For example, this command copies the tar.gz file from the current host to the host systems listed in the file `hawq-hosts`.
+
+```bash
+$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz =:/home/gpadmin
+```
+
+Run the commands to build, install, and test the package with `hawq ssh` utility on the hosts listed in the file `hawq-hosts`. The file `hawq-hosts` lists all the remote HAWQ segment hosts:
+
+```bash
+$ hawq ssh -f hawq-hosts
+>>> tar -xzvf setuptools-18.4.tar.gz
+>>> cd setuptools-18.4
+>>> python setup.py build && python setup.py install
+>>> python -c "import setuptools"
+>>> exit
+```
+
+The `setuptools` package installs the `easy_install` utility that lets you install Python packages from the Python Package Index repository. For example, this command installs Python PIP utility from the Python Package Index site.
+
+```shell
+$ cd setuptools-18.4
+$ easy_install pip
+```
+
+You can use the `hawq ssh` utility to run the `easy_install` command on all the HAWQ segment hosts.
+
+### <a id="complexinstall"></a>Complex Python Installation Example (NumPy) 
+
+This example builds and installs the Python module NumPy. NumPy is a module for scientific computing with Python. For information about NumPy, see [http://www.numpy.org/](http://www.numpy.org/).
+
+Building the NumPy package requires this software:
+- OpenBLAS libraries, an open source implementation of BLAS (Basic Linear Algebra Subprograms).
+- The gcc compilers: gcc, gcc-gfortran, and gcc-c++. The compilers are required to build the OpenBLAS libraries. See [OpenBLAS Prerequisites](#openblasprereq).
+
+This example process assumes `yum` is installed on all HAWQ segment hosts and the `gpadmin` user is a member of `sudoers` with `root` privileges on the hosts.
+
+Download the OpenBLAS and NumPy source files. For example, these `wget` commands download tar.gz files into the directory packages:
+
+```bash
+$ wget --directory-prefix=packages http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
+$ wget --directory-prefix=packages http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
+```
+
+Distribute the software to the HAWQ hosts. For example, if you download the software to `/home/gpadmin/packages`, these commands create the directory on the hosts and copies the software to hosts for the hosts listed in the `hawq-hosts` file.
+
+```bash
+$ hawq ssh -f hawq-hosts mkdir packages 
+$ hawq scp -f hawq-hosts packages/* =:/home/gpadmin/packages
+```
+
+#### <a id="openblasprereq"></a>OpenBLAS Prerequisites 
+
+1. If needed, use `yum` to install gcc compilers from system repositories. The compilers are required on all hosts where you compile OpenBLAS:
+
+	```bash
+	$ sudo yum -y install gcc gcc-gfortran gcc-c++
+	```
+
+	**Note:** If you cannot install the correct compiler versions with `yum`, you can download the gcc compilers, including gfortran, from source and install them.
+
+	These two commands download and install the compilers:
+
+	```bash
+	$ wget http://gfortran.com/download/x86_64/snapshots/gcc-4.4.tar.xz
+	$ tar xf gcc-4.4.tar.xz -C /usr/local/
+	```
+
+	If you installed `gcc` manually from a tar file, add the new `gcc` binaries to `PATH` and `LD_LIBRARY_PATH`:
+
+	```bash
+	$ export PATH=$PATH:/usr/local/gcc-4.4/bin
+	$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/gcc-4.4/lib
+	```
+
+2. Create a symbolic link to `g++` and call it `gxx`:
+
+	```bash
+	$ sudo ln -s /usr/bin/g++ /usr/bin/gxx
+	```
+
+3. You might also need to create symbolic links to any libraries that have different versions available for example `libppl_c.so.4` to `libppl_c.so.2`.
+
+4. If needed, you can use the `hawq scp` utility to copy files to HAWQ hosts and the `hawq ssh` utility to run commands on the hosts.
+
+#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
+
+Before build and install the NumPy module, you install the OpenBLAS libraries. This section describes how to build and install the libraries on a single host.
+
+1. Extract the OpenBLAS files from the file. These commands extract the files from the OpenBLAS tar file and simplify the directory name that contains the OpenBLAS files.
+
+	```bash
+	$ tar -xzf packages/v0.2.8 -C /home/gpadmin/packages
+	$ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf /home/gpadmin/packages/	OpenBLAS
+	```
+
+2. Compile OpenBLAS. These commands set the LIBRARY_PATH environment variable and run the make command to build OpenBLAS libraries.
+
+	```bash
+	$ cd /home/gpadmin/packages/OpenBLAS
+	$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+	$ make FC=gfortran USE_THREAD=0
+	```
+
+3. Use these commands to install the OpenBLAS libraries in `/usr/local` as `root`, and then change the owner of the files to `gpadmin`.
+
+	```bash
+	$ cd /home/gpadmin/packages/OpenBLAS/
+	$ sudo make PREFIX=/usr/local install
+	$ sudo ldconfig
+	$ sudo chown -R gpadmin /usr/local/lib
+	```
+
+	The following libraries are installed, along with symbolic links:
+
+	```bash
+	libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
+	libopenblas_sandybridge-r0.2.8.a
+	libopenblas_sandybridge-r0.2.8.so
+	libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
+	libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+	```
+
+4. You can use the `hawq ssh` utility to build and install the OpenBLAS libraries on multiple hosts.
+
+	All HAWQ hosts (master and segment hosts) have identical configurations. You can copy the OpenBLAS libraries from the system where they were built instead of building the OpenBlas libraries on all the hosts. For example, these `hawq ssh` and `hawq scp commands copy and install the OpenBLAS libraries on the hosts listed in the hawq-hosts file.
+
+```bash
+$ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++'
+$ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
+$ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
+$ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
+```
+```bash
+$ hawq ssh -f hawq-hosts
+>>> cd /usr/local/lib
+>>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
+>>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
+>>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
+>>> sudo ldconfig
+```
+
+#### Build and Install NumPy <a name="buildinstallnumpy"></a>
+
+After you have installed the OpenBLAS libraries, you can build and install NumPy module. These steps install the NumPy module on a single host. You can use the hawq ssh utility to build and install the NumPy module on multiple hosts.
+
+1. Go to the packages subdirectory and get the NumPy module source and extract the files.
+
+	```bash
+	$ cd /home/gpadmin/packages
+	$ tar -xzf numpy-1.8.0.tar.gz
+	```
+
+2. Set up the environment for building and installing NumPy.
+
+	```bash
+	$ export BLAS=/usr/local/lib/libopenblas.a
+	$ export LAPACK=/usr/local/lib/libopenblas.a
+	$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+	$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+	```
+
+3. Go to the NumPy directory and build and install NumPy. Building the NumPy package might take some time.
+
+	```bash
+	$ cd numpy-1.8.0
+	$ python setup.py build
+	$ python setup.py install
+	```
+
+	**Note:** If the NumPy module did not successfully build, the NumPy build process might need a site.cfg that specifies the location of the OpenBLAS libraries. Create the file `site.cfg` in the NumPy package directory:
+
+	```bash
+	$ cd ~/packages/numpy-1.8.0
+	$ touch site.cfg
+	```
+
+	Add the following to the `site.cfg` file and run the NumPy build command again:
+
+	<pre>
+	[default]
+	library_dirs = /usr/local/lib
+
+	[atlas]
+	atlas_libs = openblas
+	library_dirs = /usr/local/lib
+
+	[lapack]
+	lapack_libs = openblas
+	library_dirs = /usr/local/lib
+
+	# added for scikit-learn 
+	[openblas]
+	libraries = openblas
+	library_dirs = /usr/local/lib
+	include_dirs = /usr/local/include
+	</pre>
+
+4. The following Python command ensures that the module is available for import by Python on a host system.
+
+	```bash
+	$ python -c "import numpy"
+	```
+
+5. Similar to the simple module installation, use the `hawq ssh` utility to build, install, and test the module on HAWQ segment hosts.
+
+5. The environment variables that are require to build the NumPy module are also required in the gpadmin user environment when running Python NumPy functions. You can use the `hawq ssh` utility with the `echo` command to add the environment variables to the `.bashrc` file. For example, these echo commands add the environment variables to the `.bashrc` file in the user home directory.
+
+	```bash
+	$ echo -e '\n#Needed for NumPy' >> ~/.bashrc
+	$ echo -e 'export BLAS=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+	$ echo -e 'export LAPACK=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+	$ echo -e 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib' >> ~/.bashrc
+	$ echo -e 'export LIBRARY_PATH=$LD_LIBRARY_PATH' >> ~/.bashrc
+	```
+
+## <a id="testingpythonmodules"></a>Testing Installed Python Modules 
+
+You can create a simple PL/Python user-defined function (UDF) to validate that Python a module is available in HAWQ. This example tests the NumPy module.
+
+This PL/Python UDF imports the NumPy module. The function returns SUCCESS if the module is imported, and FAILURE if an import error occurs.
+
+```sql
+CREATE OR REPLACE FUNCTION plpy_test(x int)
+returns text
+as $$
+  try:
+      from numpy import *
+      return 'SUCCESS'
+  except ImportError, e:
+      return 'FAILURE'
+$$ language plpythonu;
+```
+
+Create a table that contains data on each HAWQ segment instance. Depending on the size of your HAWQ installation, you might need to generate more data to ensure data is distributed to all segment instances.
+
+```sql
+CREATE TABLE DIST AS (SELECT x FROM generate_series(1,50) x ) DISTRIBUTED RANDOMLY ;
+```
+
+This SELECT command runs the UDF on the segment hosts where data is stored in the primary segment instances.
+
+```sql
+SELECT gp_segment_id, plpy_test(x) AS status
+  FROM dist
+  GROUP BY gp_segment_id, status
+  ORDER BY gp_segment_id, status;
+```
+
+The SELECT command returns SUCCESS if the UDF imported the Python module on the HAWQ segment instance. If the SELECT command returns FAILURE, you can find the segment host of the segment instance host. The HAWQ system table `gp_segment_configuration` contains information about segment configuration. This command returns the host name for a segment ID.
+
+```sql
+SELECT hostname, content AS seg_ID FROM gp_segment_configuration
+  WHERE content = seg_id ;
+```
+
+If FAILURE is returned, these are some possible causes:
+
+- A problem accessing required libraries. For the NumPy example, HAWQ might have a problem accessing the OpenBLAS libraries or the Python libraries on a segment host.
+
+	Make sure you get no errors when running command on the segment host as the gpadmin user. This hawq ssh command tests importing the NumPy module on the segment host mdw1.
+
+	```shell
+	$ hawq ssh -h mdw1 python -c "import numpy"
+	```
+
+- If the Python import command does not return an error, environment variables might not be configured in the HAWQ environment. For example, the variables are not in the `.bashrc` file, or HAWQ might not have been restarted after adding the environment variables to the `.bashrc` file.
+
+	Ensure sure that the environment variables are properly set and then restart HAWQ. For the NumPy example, ensure the environment variables listed at the end of the section [Build and Install NumPy](#buildinstallnumpy) are defined in the `.bashrc` file for the gpadmin user on the master and segment hosts.
+
+	**Note:** On HAWQ master and segment hosts, the `.bashrc` file for the gpadmin user must source the file `$GPHOME/greenplum_path.sh`.
+
+## <a id="examples"></a>Examples 
+
+This PL/Python UDF returns the maximum of two integers:
+
+```sql
+CREATE FUNCTION pymax (a integer, b integer)
+  RETURNS integer
+AS $$
+  if (a is None) or (b is None):
+      return None
+  if a > b:
+     return a
+  return b
+$$ LANGUAGE plpythonu;
+```
+
+You can use the STRICT property to perform the null handling instead of using the two conditional statements.
+
+```sql
+CREATE FUNCTION pymax (a integer, b integer) 
+  RETURNS integer AS $$ 
+return max(a,b) 
+$$ LANGUAGE plpythonu STRICT ;
+```
+
+You can run the user-defined function pymax with SELECT command. This example runs the UDF and shows the output.
+
+```sql
+SELECT ( pymax(123, 43));
+column1
+---------
+     123
+(1 row)
+```
+
+This example that returns data from an SQL query that is run against a table. These two commands create a simple table and add data to the table.
+
+```sql
+CREATE TABLE sales (id int, year int, qtr int, day int, region text)
+  DISTRIBUTED BY (id) ;
+
+INSERT INTO sales VALUES
+ (1, 2014, 1,1, 'usa'),
+ (2, 2002, 2,2, 'europe'),
+ (3, 2014, 3,3, 'asia'),
+ (4, 2014, 4,4, 'usa'),
+ (5, 2014, 1,5, 'europe'),
+ (6, 2014, 2,6, 'asia'),
+ (7, 2002, 3,7, 'usa') ;
+```
+
+This PL/Python UDF executes a SELECT command that returns 5 rows from the table. The Python function returns the REGION value from the row specified by the input value. In the Python function, the row numbering starts from 0. Valid input for the function is an integer between 0 and 4.
+
+```sql
+CREATE OR REPLACE FUNCTION mypytest(a integer) 
+  RETURNS text 
+AS $$ 
+  rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+  region = rv[a]["region"]
+  return region
+$$ language plpythonu;
+```
+
+Running this SELECT statement returns the REGION column value from the third row of the result set.
+
+```sql
+SELECT mypytest(2) ;
+```
+
+This command deletes the UDF from the database.
+
+```sql
+DROP FUNCTION mypytest(integer) ;
+```
+
+## <a id="references"></a>References 
+
+This section lists references for using PL/Python.
+
+### <a id="technicalreferences"></a>Technical References 
+
+For information about PL/Python see the PostgreSQL documentation at [http://www.postgresql.org/docs/8.2/static/plpython.html](http://www.postgresql.org/docs/8.2/static/plpython.html).
+
+For information about Python Package Index (PyPI), see [https://pypi.python.org/pypi](https://pypi.python.org/pypi).
+
+These are some Python modules that can be downloaded:
+
+- SciPy library provides user-friendly and efficient numerical routines such as routines for numerical integration and optimization [http://www.scipy.org/scipylib/index.html](http://www.scipy.org/scipylib/index.html). This wget command downloads the SciPy package tar file.
+
+ ```shell
+$ wget http://sourceforge.net/projects/scipy/files/scipy/0.10.1/ scipy-0.10.1.tar.gz/download
+```
+
+- Natural Language Toolkit (nltk) is a platform for building Python programs to work with human language data http://www.nltk.org/. This wget command downloads the nltk package tar file.
+
+ ```shell
+$ wget http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
+ ```
+
+ **Note:** The Python package Distribute [https://pypi.python.org/pypi/](https://pypi.python.org/pypi/) distribute is required for `nltk`. The Distribute module should be installed before the `ntlk` package. This wget command downloads the Distribute package tar file.
+
+```shell
+$ wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
+```
+
+### <a id="usefulreading"></a>Useful Reading 
+
+For information about the Python language, see [http://www.python.org/](http://www.python.org/).
+
+A set of slides that were used in a talk about how the Pivotal Data Science team uses the PyData stack in the Pivotal MPP databases and on Pivotal Cloud Foundry [http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal](http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal).
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plr.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plr.html.md.erb b/plext/using_plr.html.md.erb
new file mode 100644
index 0000000..49d207f
--- /dev/null
+++ b/plext/using_plr.html.md.erb
@@ -0,0 +1,229 @@
+---
+title: Using PL/R in HAWQ
+---
+
+PL/R is a procedural language. With the HAWQ PL/R extension, you can write database functions in the R programming language and use R packages that contain R functions and data sets.
+
+**Note**: To use PL/R in HAWQ, R must be installed on each node in your HAWQ cluster. Additionally, you must install the PL/R package on an existing HAWQ deployment or have specified PL/R as a build option when compiling HAWQ.
+
+## <a id="plrexamples"></a>PL/R Examples 
+
+This section contains simple PL/R examples.
+
+### <a id="example1"></a>Example 1: Using PL/R for Single Row Operators 
+
+This function generates an array of numbers with a normal distribution using the R function `rnorm()`.
+
+```sql
+CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, 
+  std_dev float8) RETURNS float8[ ] AS
+$$
+  x<-rnorm(n,mean,std_dev)
+  return(x)
+$$
+LANGUAGE 'plr';
+```
+
+The following `CREATE TABLE` command uses the `r_norm` function to populate the table. The `r_norm` function creates an array of 10 numbers.
+
+```sql
+CREATE TABLE test_norm_var
+  AS SELECT id, r_norm(10,0,1) as x
+  FROM (SELECT generate_series(1,30:: bigint) AS ID) foo
+  DISTRIBUTED BY (id);
+```
+
+### <a id="example2"></a>Example 2: Returning PL/R data.frames in Tabular Form 
+
+Assuming your PL/R function returns an R `data.frame` as its output \(unless you want to use arrays of arrays\), some work is required in order for HAWQ to see your PL/R `data.frame` as a simple SQL table:
+
+Create a TYPE in HAWQ with the same dimensions as your R `data.frame`:
+
+```sql
+CREATE TYPE t1 AS ...
+```
+
+Use this TYPE when defining your PL/R function:
+
+```sql
+... RETURNS SET OF t1 AS ...
+```
+
+Sample SQL for this situation is provided in the next example.
+
+### <a id="example3"></a>Example 3: Process Employee Information Using PL/R 
+
+The SQL below defines a TYPE and a function to process employee information with `data.frame` using PL/R:
+
+```sql
+-- Create type to store employee information
+DROP TYPE IF EXISTS emp_type CASCADE;
+CREATE TYPE emp_type AS (name text, age int, salary numeric(10,2));
+
+-- Create function to process employee information and return data.frame
+DROP FUNCTION IF EXISTS get_emps();
+CREATE OR REPLACE FUNCTION get_emps() RETURNS SETOF emp_type AS '
+    names <- c("Joe","Jim","Jon")
+    ages <- c(41,25,35)
+    salaries <- c(250000,120000,50000)
+    df <- data.frame(name = names, age = ages, salary = salaries)
+
+    return(df)
+' LANGUAGE 'plr';
+
+-- Call the function
+SELECT * FROM get_emps();
+```
+
+
+## <a id="downloadinstallplrlibraries"></a>Downloading and Installing R Packages 
+
+R packages are modules that contain R functions and data sets. You can install R packages to extend R and PL/R functionality in HAWQ.
+
+**Note**: If you expand HAWQ and add segment hosts, you must install the R packages in the R installation of *each* of the new hosts.</p>
+
+1. For an R package, identify all dependent R packages and each package web URL. The information can be found by selecting the given package from the following navigation page:
+
+	[http://cran.r-project.org/web/packages/available_packages_by_name.html](http://cran.r-project.org/web/packages/available_packages_by_name.html)
+
+	As an example, the page for the R package `arm` indicates that the package requires the following R libraries: `Matrix`, `lattice`, `lme4`, `R2WinBUGS`, `coda`, `abind`, `foreign`, and `MASS`.
+	
+	You can also try installing the package with `R CMD INSTALL` command to determine the dependent packages.
+	
+	For the R installation included with the HAWQ PL/R extension, the required R packages are installed with the PL/R extension. However, the Matrix package requires a newer version.
+	
+1. From the command line, use the `wget` utility to download the tar.gz files for the `arm` package to the HAWQ master host:
+
+	```shell
+	$ wget http://cran.r-project.org/src/contrib/Archive/arm/arm_1.5-03.tar.gz
+	$ wget http://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_0.9996875-1.tar.gz
+	```
+
+1. Use the `hawq scp` utility and the `hawq_hosts` file to copy the tar.gz files to the same directory on all nodes of the HAWQ cluster. The `hawq_hosts` file contains a list of all of the HAWQ segment hosts. You might require root access to do this.
+
+	```shell
+	$ hawq scp -f hosts_all Matrix_0.9996875-1.tar.gz =:/home/gpadmin 
+	$ hawq scp -f hawq_hosts arm_1.5-03.tar.gz =:/home/gpadmin
+	```
+
+1. Use the `hawq ssh` utility in interactive mode to log into each HAWQ segment host (`hawq ssh -f hawq_hosts`). Install the packages from the command prompt using the `R CMD INSTALL` command. Note that this may require root access. For example, this R install command installs the packages for the `arm` package.
+
+	```shell
+	$ R CMD INSTALL Matrix_0.9996875-1.tar.gz arm_1.5-03.tar.gz
+	```
+	**Note**: Some packages require compilation. Refer to the package documentation for possible build requirements.
+
+1. Ensure that the R package was installed in the `/usr/lib64/R/library` directory on all the segments (`hawq ssh` can be used to install the package). For example, this `hawq ssh` command lists the contents of the R library directory.
+
+	```shell
+	$ hawq ssh -f hawq_hosts "ls /usr/lib64/R/library"
+	```
+	
+1. Verify the R package can be loaded.
+
+	This function performs a simple test to determine if an R package can be loaded:
+	
+	```sql
+	CREATE OR REPLACE FUNCTION R_test_require(fname text)
+	RETURNS boolean AS
+	$BODY$
+    	return(require(fname,character.only=T))
+	$BODY$
+	LANGUAGE 'plr';
+	```
+
+	This SQL command calls the previous function to determine if the R package `arm` can be loaded:
+	
+	```sql
+	SELECT R_test_require('arm');
+	```
+
+## <a id="rlibrarydisplay"></a>Displaying R Library Information 
+
+You can use the R command line to display information about the installed libraries and functions on the HAWQ host. You can also add and remove libraries from the R installation. To start the R command line on the host, log in to the host as the `gpadmin` user and run the script R.
+
+``` shell
+$ R
+```
+
+This R function lists the available R packages from the R command line:
+
+```r
+> library()
+```
+
+Display the documentation for a particular R package
+
+```r
+> library(help="package_name")
+> help(package="package_name")
+```
+
+Display the help file for an R function:
+
+```r
+> help("function_name")
+> ?function_name
+```
+
+To see what packages are installed, use the R command `installed.packages()`. This will return a matrix with a row for each package that has been installed. Below, we look at the first 5 rows of this matrix.
+
+```r
+> installed.packages()
+```
+
+Any package that does not appear in the installed packages matrix must be installed and loaded before its functions can be used.
+
+An R package can be installed with `install.packages()`:
+
+```r
+> install.packages("package_name") 
+> install.packages("mypkg", dependencies = TRUE, type="source")
+```
+
+Load a package from the R command line.
+
+```r
+> library(" package_name ") 
+```
+An R package can be removed with remove.packages
+
+```r
+> remove.packages("package_name")
+```
+
+You can use the R command `-e` option to run functions from the command line. For example, this command displays help on the R package named `MASS`.
+
+```shell
+$ R -e 'help("MASS")'
+```
+
+## <a id="plrreferences"></a>References 
+
+[http://www.r-project.org/](http://www.r-project.org/) - The R Project home page
+
+[https://github.com/pivotalsoftware/gp-r](https://github.com/pivotalsoftware/gp-r) - GitHub repository that contains information about using R.
+
+[https://github.com/pivotalsoftware/PivotalR](https://github.com/pivotalsoftware/PivotalR) - GitHub repository for PivotalR, a package that provides an R interface to operate on HAWQ tables and views that is similar to the R `data.frame`. PivotalR also supports using the machine learning package MADlib directly from R.
+
+R documentation is installed with the R package:
+
+```shell
+/usr/share/doc/R-N.N.N
+```
+
+where N.N.N corresponds to the version of R installed.
+
+### <a id="rfunctions"></a>R Functions and Arguments 
+
+See [http://www.joeconway.com/plr/doc/plr-funcs.html](http://www.joeconway.com/plr/doc/plr-funcs.html).
+
+### <a id="passdatavalues"></a>Passing Data Values in R 
+
+See [http://www.joeconway.com/plr/doc/plr-data.html](http://www.joeconway.com/plr/doc/plr-data.html).
+
+### <a id="aggregatefunctions"></a>Aggregate Functions in R 
+
+See [http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html](http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/pxf/ConfigurePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/ConfigurePXF.html.md.erb b/pxf/ConfigurePXF.html.md.erb
new file mode 100644
index 0000000..087a89a
--- /dev/null
+++ b/pxf/ConfigurePXF.html.md.erb
@@ -0,0 +1,67 @@
+---
+title: Configuring PXF
+---
+
+This topic describes how to configure the PXF service.
+
+**Note:** After you make any changes to a PXF configuration file (such as `pxf-profiles.xml` for adding custom profiles), propagate the changes to all nodes with PXF installed, and then restart the PXF service on all nodes.
+
+## <a id="settingupthejavaclasspath"></a>Setting up the Java Classpath
+
+The classpath for the PXF service is set during the plug-in installation process. Administrators should only modify it when adding new PXF connectors. The classpath is defined in two files:
+
+1.  `/etc/pxf/conf/pxf-private.classpath` – contains all the required resources to run the PXF service, including pxf-hdfs, pxf-hbase, and pxf-hive plug-ins. This file must not be edited or removed.
+2.  `/etc/pxf/conf/pxf-public.classpath` – plug-in jar files and any dependent jar files for custom plug-ins and custom profiles should be added here. The classpath resources should be defined one per line. Wildcard characters can be used in the name of the resource, but not in the full path. See [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on adding custom profiles.
+
+After changing the classpath files, the PXF service must be restarted. 
+
+## <a id="settingupthejvmcommandlineoptionsforpxfservice"></a>Setting up the JVM Command Line Options for the PXF Service
+
+The PXF service JVM command line options can be added or modified for each pxf-service instance in the `/var/pxf/pxf-service/bin/setenv.sh` file:
+
+Currently the `JVM_OPTS` parameter is set with the following values for maximum Java heap size and thread stack size:
+
+``` shell
+JVM_OPTS="-Xmx512M -Xss256K"
+```
+
+After adding or modifying the JVM command line options, the PXF service must be restarted.
+
+## <a id="topic_i3f_hvm_ss"></a>Using PXF on a Secure HDFS Cluster
+
+You can use PXF on a secure HDFS cluster. Read, write, and analyze operations for PXF tables on HDFS files are enabled. It requires no changes to preexisting PXF tables from a previous version.
+
+### <a id="requirements"></a>Requirements
+
+-   Both HDFS and YARN principals are created and are properly configured.
+-   HAWQ is correctly configured to work in secure mode.
+
+Please refer to [Troubleshooting PXF](TroubleshootingPXF.html) for common errors related to PXF security and their meaning.
+
+## <a id="credentialsforremoteservices"></a>Credentials for Remote Services
+
+Credentials for remote services allows a PXF plug-in to access a remote service that requires credentials.
+
+### <a id="inhawq"></a>In HAWQ
+
+Two parameters for credentials are implemented in HAWQ:
+
+-   `pxf_remote_service_login` – a string of characters detailing information regarding login (i.e. user name).
+-   `pxf_remote_service_secret` – a string of characters detailing information that is considered secret (i.e. password).
+
+Currently, the contents of the two parameters are stored in memory, without any security, for the whole session. Leaving the session will insecurely drop the contents of the parameters.
+
+**Important:** These parameters are temporary and could soon be deprecated, in favor of a complete solution for managing credentials for remote services in PXF.
+
+### <a id="inapxfplugin"></a>In a PXF Plug-in
+
+In a PXF plug-in, the contents of the two credentials parameters is available through the following InputData API functions:
+
+``` java
+string getLogin()
+string getSecret()
+```
+
+Both functions return 'null' if the corresponding HAWQ parameter was set to an empty string or was not set at all. 
+
+


Mime
View raw message