hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [30/57] [abbrv] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs
Date Tue, 10 Jan 2017 23:54:21 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_plr.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_plr.html.md.erb b/markdown/plext/using_plr.html.md.erb
new file mode 100644
index 0000000..367a1d0
--- /dev/null
+++ b/markdown/plext/using_plr.html.md.erb
@@ -0,0 +1,229 @@
+---
+title: Using PL/R in HAWQ
+---
+
+PL/R is a procedural language. With the HAWQ PL/R extension, you can write database functions in the R programming language and use R packages that contain R functions and data sets.
+
+**Note**: To use PL/R in HAWQ, R must be installed on each node in your HAWQ cluster. Additionally, you must install the PL/R package on an existing HAWQ deployment or have specified PL/R as a build option when compiling HAWQ.
+
+## <a id="plrexamples"></a>PL/R Examples 
+
+This section contains simple PL/R examples.
+
+### <a id="example1"></a>Example 1: Using PL/R for Single Row Operators 
+
+This function generates an array of numbers with a normal distribution using the R function `rnorm()`.
+
+```sql
+CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, 
+  std_dev float8) RETURNS float8[ ] AS
+$$
+  x<-rnorm(n,mean,std_dev)
+  return(x)
+$$
+LANGUAGE 'plr';
+```
+
+The following `CREATE TABLE` command uses the `r_norm` function to populate the table. The `r_norm` function creates an array of 10 numbers.
+
+```sql
+CREATE TABLE test_norm_var
+  AS SELECT id, r_norm(10,0,1) AS x
+  FROM (SELECT generate_series(1,30:: bigint) AS ID) foo
+  DISTRIBUTED BY (id);
+```
+
+### <a id="example2"></a>Example 2: Returning PL/R data.frames in Tabular Form 
+
+Assuming your PL/R function returns an R `data.frame` as its output \(unless you want to use arrays of arrays\), some work is required in order for HAWQ to see your PL/R `data.frame` as a simple SQL table:
+
+Create a TYPE in HAWQ with the same dimensions as your R `data.frame`:
+
+```sql
+CREATE TYPE t1 AS ...
+```
+
+Use this TYPE when defining your PL/R function:
+
+```sql
+... RETURNS SET OF t1 AS ...
+```
+
+Sample SQL for this situation is provided in the next example.
+
+### <a id="example3"></a>Example 3: Process Employee Information Using PL/R 
+
+The SQL below defines a TYPE and a function to process employee information with `data.frame` using PL/R:
+
+```sql
+-- Create type to store employee information
+DROP TYPE IF EXISTS emp_type CASCADE;
+CREATE TYPE emp_type AS (name text, age int, salary numeric(10,2));
+
+-- Create function to process employee information and return data.frame
+DROP FUNCTION IF EXISTS get_emps();
+CREATE OR REPLACE FUNCTION get_emps() RETURNS SETOF emp_type AS '
+    names <- c("Joe","Jim","Jon")
+    ages <- c(41,25,35)
+    salaries <- c(250000,120000,50000)
+    df <- data.frame(name = names, age = ages, salary = salaries)
+
+    return(df)
+' LANGUAGE 'plr';
+
+-- Call the function
+SELECT * FROM get_emps();
+```
+
+
+## <a id="downloadinstallplrlibraries"></a>Downloading and Installing R Packages 
+
+R packages are modules that contain R functions and data sets. You can install R packages to extend R and PL/R functionality in HAWQ.
+
+**Note**: If you expand HAWQ and add segment hosts, you must install the R packages in the R installation of *each* of the new hosts.</p>
+
+1. For an R package, identify all dependent R packages and each package web URL. The information can be found by selecting the given package from the following navigation page:
+
+	[http://cran.r-project.org/web/packages/available_packages_by_name.html](http://cran.r-project.org/web/packages/available_packages_by_name.html)
+
+	As an example, the page for the R package `arm` indicates that the package requires the following R libraries: `Matrix`, `lattice`, `lme4`, `R2WinBUGS`, `coda`, `abind`, `foreign`, and `MASS`.
+	
+	You can also try installing the package with `R CMD INSTALL` command to determine the dependent packages.
+	
+	For the R installation included with the HAWQ PL/R extension, the required R packages are installed with the PL/R extension. However, the Matrix package requires a newer version.
+	
+1. From the command line, use the `wget` utility to download the tar.gz files for the `arm` package to the HAWQ master host:
+
+	```shell
+	$ wget http://cran.r-project.org/src/contrib/Archive/arm/arm_1.5-03.tar.gz
+	$ wget http://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_0.9996875-1.tar.gz
+	```
+
+1. Use the `hawq scp` utility and the `hawq_hosts` file to copy the tar.gz files to the same directory on all nodes of the HAWQ cluster. The `hawq_hosts` file contains a list of all of the HAWQ segment hosts. You might require root access to do this.
+
+	```shell
+	$ hawq scp -f hosts_all Matrix_0.9996875-1.tar.gz =:/home/gpadmin 
+	$ hawq scp -f hawq_hosts arm_1.5-03.tar.gz =:/home/gpadmin
+	```
+
+1. Use the `hawq ssh` utility in interactive mode to log into each HAWQ segment host (`hawq ssh -f hawq_hosts`). Install the packages from the command prompt using the `R CMD INSTALL` command. Note that this may require root access. For example, this R install command installs the packages for the `arm` package.
+
+	```shell
+	$ R CMD INSTALL Matrix_0.9996875-1.tar.gz arm_1.5-03.tar.gz
+	```
+	**Note**: Some packages require compilation. Refer to the package documentation for possible build requirements.
+
+1. Ensure that the R package was installed in the `/usr/lib64/R/library` directory on all the segments (`hawq ssh` can be used to install the package). For example, this `hawq ssh` command lists the contents of the R library directory.
+
+	```shell
+	$ hawq ssh -f hawq_hosts "ls /usr/lib64/R/library"
+	```
+	
+1. Verify the R package can be loaded.
+
+	This function performs a simple test to determine if an R package can be loaded:
+	
+	```sql
+	CREATE OR REPLACE FUNCTION R_test_require(fname text)
+	RETURNS boolean AS
+	$BODY$
+    	return(require(fname,character.only=T))
+	$BODY$
+	LANGUAGE 'plr';
+	```
+
+	This SQL command calls the previous function to determine if the R package `arm` can be loaded:
+	
+	```sql
+	SELECT R_test_require('arm');
+	```
+
+## <a id="rlibrarydisplay"></a>Displaying R Library Information 
+
+You can use the R command line to display information about the installed libraries and functions on the HAWQ host. You can also add and remove libraries from the R installation. To start the R command line on the host, log in to the host as the `gpadmin` user and run the script R.
+
+``` shell
+$ R
+```
+
+This R function lists the available R packages from the R command line:
+
+```r
+> library()
+```
+
+Display the documentation for a particular R package
+
+```r
+> library(help="package_name")
+> help(package="package_name")
+```
+
+Display the help file for an R function:
+
+```r
+> help("function_name")
+> ?function_name
+```
+
+To see what packages are installed, use the R command `installed.packages()`. This will return a matrix with a row for each package that has been installed. Below, we look at the first 5 rows of this matrix.
+
+```r
+> installed.packages()
+```
+
+Any package that does not appear in the installed packages matrix must be installed and loaded before its functions can be used.
+
+An R package can be installed with `install.packages()`:
+
+```r
+> install.packages("package_name") 
+> install.packages("mypkg", dependencies = TRUE, type="source")
+```
+
+Load a package from the R command line.
+
+```r
+> library(" package_name ") 
+```
+An R package can be removed with remove.packages
+
+```r
+> remove.packages("package_name")
+```
+
+You can use the R command `-e` option to run functions from the command line. For example, this command displays help on the R package named `MASS`.
+
+```shell
+$ R -e 'help("MASS")'
+```
+
+## <a id="plrreferences"></a>References 
+
+[http://www.r-project.org/](http://www.r-project.org/) - The R Project home page
+
+[https://github.com/pivotalsoftware/gp-r](https://github.com/pivotalsoftware/gp-r) - GitHub repository that contains information about using R.
+
+[https://github.com/pivotalsoftware/PivotalR](https://github.com/pivotalsoftware/PivotalR) - GitHub repository for PivotalR, a package that provides an R interface to operate on HAWQ tables and views that is similar to the R `data.frame`. PivotalR also supports using the machine learning package MADlib directly from R.
+
+R documentation is installed with the R package:
+
+```shell
+/usr/share/doc/R-N.N.N
+```
+
+where N.N.N corresponds to the version of R installed.
+
+### <a id="rfunctions"></a>R Functions and Arguments 
+
+See [http://www.joeconway.com/plr/doc/plr-funcs.html](http://www.joeconway.com/plr/doc/plr-funcs.html).
+
+### <a id="passdatavalues"></a>Passing Data Values in R 
+
+See [http://www.joeconway.com/plr/doc/plr-data.html](http://www.joeconway.com/plr/doc/plr-data.html).
+
+### <a id="aggregatefunctions"></a>Aggregate Functions in R 
+
+See [http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html](http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/ConfigurePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/ConfigurePXF.html.md.erb b/markdown/pxf/ConfigurePXF.html.md.erb
new file mode 100644
index 0000000..fec6b27
--- /dev/null
+++ b/markdown/pxf/ConfigurePXF.html.md.erb
@@ -0,0 +1,69 @@
+---
+title: Configuring PXF
+---
+
+This topic describes how to configure the PXF service.
+
+**Note:** After you make any changes to a PXF configuration file (such as `pxf-profiles.xml` for adding custom profiles), propagate the changes to all nodes with PXF installed, and then restart the PXF service on all nodes.
+
+## <a id="settingupthejavaclasspath"></a>Setting up the Java Classpath
+
+The classpath for the PXF service is set during the plug-in installation process. Administrators should only modify it when adding new PXF connectors. The classpath is defined in two files:
+
+1.  `/etc/pxf/conf/pxf-private.classpath` – contains all the required resources to run the PXF service, including pxf-hdfs, pxf-hbase, and pxf-hive plug-ins. This file must not be edited or removed.
+2.  `/etc/pxf/conf/pxf-public.classpath` – plug-in jar files and any dependent jar files for custom plug-ins and custom profiles should be added here. The classpath resources should be defined one per line. Wildcard characters can be used in the name of the resource, but not in the full path. See [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on adding custom profiles.
+
+After changing the classpath files, the PXF service must be restarted. 
+
+## <a id="settingupthejvmcommandlineoptionsforpxfservice"></a>Setting up the JVM Command Line Options for the PXF Service
+
+The PXF service JVM command line options can be added or modified for each pxf-service instance in the `/var/pxf/pxf-service/bin/setenv.sh` file:
+
+Currently the `JVM_OPTS` parameter is set with the following values for maximum Java heap size and thread stack size:
+
+``` shell
+JVM_OPTS="-Xmx512M -Xss256K"
+```
+
+After adding or modifying the JVM command line options, the PXF service must be restarted.
+
+(Refer to [Addressing PXF Memory Issues](TroubleshootingPXF.html#pxf-memcfg) for a related discussion of the configuration options available to address memory issues in your PXF deployment.)
+
+## <a id="topic_i3f_hvm_ss"></a>Using PXF on a Secure HDFS Cluster
+
+You can use PXF on a secure HDFS cluster. Read, write, and analyze operations for PXF tables on HDFS files are enabled. It requires no changes to preexisting PXF tables from a previous version.
+
+### <a id="requirements"></a>Requirements
+
+-   Both HDFS and YARN principals are created and are properly configured.
+-   HAWQ is correctly configured to work in secure mode.
+
+Please refer to [Troubleshooting PXF](TroubleshootingPXF.html) for common errors related to PXF security and their meaning.
+
+## <a id="credentialsforremoteservices"></a>Credentials for Remote Services
+
+Credentials for remote services allows a PXF plug-in to access a remote service that requires credentials.
+
+### <a id="inhawq"></a>In HAWQ
+
+Two parameters for credentials are implemented in HAWQ:
+
+-   `pxf_remote_service_login` – a string of characters detailing information regarding login (i.e. user name).
+-   `pxf_remote_service_secret` – a string of characters detailing information that is considered secret (i.e. password).
+
+Currently, the contents of the two parameters are stored in memory, without any security, for the whole session. Leaving the session will insecurely drop the contents of the parameters.
+
+**Important:** These parameters are temporary and could soon be deprecated, in favor of a complete solution for managing credentials for remote services in PXF.
+
+### <a id="inapxfplugin"></a>In a PXF Plug-in
+
+In a PXF plug-in, the contents of the two credentials parameters is available through the following InputData API functions:
+
+``` java
+string getLogin()
+string getSecret()
+```
+
+Both functions return 'null' if the corresponding HAWQ parameter was set to an empty string or was not set at all. 
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HBasePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HBasePXF.html.md.erb b/markdown/pxf/HBasePXF.html.md.erb
new file mode 100644
index 0000000..8b89730
--- /dev/null
+++ b/markdown/pxf/HBasePXF.html.md.erb
@@ -0,0 +1,105 @@
+---
+title: Accessing HBase Data
+---
+
+## <a id="installingthepxfhbaseplugin"></a>Prerequisites
+
+Before trying to access HBase data with PXF, verify the following:
+
+-   The `/etc/hbase/conf/hbase-env.sh` configuration file must reference the `pxf-hbase.jar`. For example, `/etc/hbase/conf/hbase-env.sh` should include the line:
+
+    ``` bash
+    export HBASE_CLASSPATH=${HBASE_CLASSPATH}:/usr/lib/pxf/pxf-hbase.jar
+    ```
+
+    **Note:** You must restart HBase after making any changes to the HBase configuration.
+
+-   PXF HBase plug-in is installed on all cluster nodes.
+-   HBase and ZooKeeper jars are installed on all cluster nodes.
+
+## <a id="syntax3"></a>Syntax
+
+To create an external HBase table, use the following syntax:
+
+``` sql
+CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
+    ( column_name data_type [, ...] | LIKE other_table )
+LOCATION ('pxf://namenode[:port]/hbase-table-name?Profile=HBase')
+FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
+```
+
+The HBase profile is equivalent to the following PXF parameters:
+
+-   Fragmenter=org.apache.hawq.pxf.plugins.hbase.HBaseDataFragmenter
+-   Accessor=org.apache.hawq.pxf.plugins.hbase.HBaseAccessor
+-   Resolver=org.apache.hawq.pxf.plugins.hbase.HBaseResolver
+
+## <a id="columnmapping"></a>Column Mapping
+
+Most HAWQ external tables (PXF or others) require that the HAWQ table attributes match the source data record layout, and include all the available attributes. With HAWQ, however, you use the PXF HBase plug-in to specify the subset of HBase qualifiers that define the HAWQ PXF table. To set up a clear mapping between each attribute in the PXF table and a specific qualifier in the HBase table, you can use either direct mapping or indirect mapping. In addition, the HBase row key is handled in a special way.
+
+### <a id="rowkey"></a>Row Key
+
+You can use the HBase table row key in several ways. For example, you can see them using query results, or you can run a WHERE clause filter on a range of row key values. To use the row key in the HAWQ query, define the HAWQ table with the reserved PXF attribute `recordkey.` This attribute name tells PXF to return the record key in any key-value based system and in HBase.
+
+**Note:** Because HBase is byte and not character-based, you should define the recordkey as type bytea. This may result in better ability to filter data and increase performance.
+
+``` sql
+CREATE EXTERNAL TABLE <tname> (recordkey bytea, ... ) LOCATION ('pxf:// ...')
+```
+
+### <a id="directmapping"></a>Direct Mapping
+
+Use Direct Mapping to map HAWQ table attributes to HBase qualifiers. You can specify the HBase qualifier names of interest, with column family names included, as quoted values. 
+
+For example, you have defined an HBase table called `hbase_sales` with multiple column families and many qualifiers. To create a HAWQ table with these attributes:
+
+-   `rowkey`
+-   qualifier `saleid` in the column family `cf1`
+-   qualifier `comments` in the column family `cf8` 
+
+use the following `CREATE EXTERNAL TABLE` syntax:
+
+``` sql
+CREATE EXTERNAL TABLE hbase_sales (
+  recordkey bytea,
+  "cf1:saleid" int,
+  "cf8:comments" varchar
+) ...
+```
+
+The PXF HBase plug-in uses these attribute names as-is and returns the values of these HBase qualifiers.
+
+### <a id="indirectmappingvialookuptable"></a>Indirect Mapping (via Lookup Table)
+
+The direct mapping method is fast and intuitive, but using indirect mapping helps to reconcile HBase qualifier names with HAWQ behavior:
+
+-   HBase qualifier names may be longer than 32 characters. HAWQ has a 32-character limit on attribute name size.
+-   HBase qualifier names can be binary or non-printable. HAWQ attribute names are character based.
+
+In either case, Indirect Mapping uses a lookup table on HBase. You can create the lookup table to store all necessary lookup information. This works as a template for any future queries. The name of the lookup table must be `pxflookup` and must include the column family named `mapping`.
+
+Using the sales example in Direct Mapping, if our `rowkey` represents the HBase table name and the `mapping` column family includes the actual attribute mapping in the key value form of`<hawq attr name>=<hbase                             cf:qualifier>`.
+
+#### <a id="example5"></a>Example
+
+This example maps the `saleid` qualifier in the `cf1` column family to the HAWQ `id` column and the `comments` qualifier in the `cf8` family to the HAWQ `cmts` column.
+
+| (row key) | mapping           |
+|-----------|-------------------|
+| sales     | id=cf1:saleid     |
+| sales     | cmts=cf8:comments |
+
+The mapping assigned new names for each qualifier. You can use these names in your HAWQ table definition:
+
+``` sql
+CREATE EXTERNAL TABLE hbase_sales (
+  recordkey bytea
+  id int,
+  cmts varchar
+) ...
+```
+
+PXF automatically matches HAWQ to HBase column names when a `pxflookup` table exists in HBase.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HDFSFileDataPXF.html.md.erb b/markdown/pxf/HDFSFileDataPXF.html.md.erb
new file mode 100644
index 0000000..2021565
--- /dev/null
+++ b/markdown/pxf/HDFSFileDataPXF.html.md.erb
@@ -0,0 +1,452 @@
+---
+title: Accessing HDFS File Data
+---
+
+HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited and comma-separated-value format text files.  The HDFS plug-in also supports the Avro binary format.
+
+This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store.
+
+## <a id="hdfsplugin_prereq"></a>Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
+
+## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
+
+The PXF HDFS plug-in supports reading the following file formats:
+
+- Text File - comma-separated value (.csv) or delimited format plain text file
+- Avro - JSON-defined, schema-based data serialization format
+
+The PXF HDFS plug-in includes the following profiles to support the file formats listed above:
+
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
+
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile.
+
+## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  These tools support typical file system operations including copying and listing files, changing file permissions, and so forth.
+
+The HDFS file system command syntax is `hdfs dfs <options> [<file>]`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+The user invoking the `hdfs dfs` command must have sufficient privileges to the HDFS data store to perform HDFS file system operations. Specifically, the user must have write permission to HDFS to create directories and files.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|-------|-------------------------------------|
+| `-cat`    | Display file contents. |
+| `-mkdir`    | Create directory in HDFS. |
+| `-put`    | Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ hdfs dfs -mkdir -p /data/exampledir
+```
+
+Copy a text file to HDFS:
+
+``` shell
+$ hdfs dfs -put /tmp/example.txt /data/exampledir/
+```
+
+Display the contents of a text file in HDFS:
+
+``` shell
+$ hdfs dfs -cat /data/exampledir/example.txt
+```
+
+
+## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
+The PXF HDFS plug-in supports the `HdfsTextSimple`, `HdfsTextMulti`, and `Avro` profiles.
+
+Use the following syntax to create a HAWQ external table representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL TABLE <table_name> 
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
+    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`, or `Avro`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific options are discussed in the relevant profile topic later in this section.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> references a plain text delimited file.  |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profile. The `Avro` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_import')` \<formatting-property\> |
+ \<formatting-properties\>    | \<formatting-properties\> are profile-specific. Profile-specific formatting options are discussed in the relevant profile topic later in this section. |
+
+*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
+
+## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when reading plain text delimited or .csv files where each row is a single record.
+
+\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. Default value is a comma `,`.|
+
+### <a id="profile_hdfstextsimple_query"></a>Example: Using the HdfsTextSimple Profile
+
+Perform the following steps to create a sample data file, copy the file to HDFS, and use the `HdfsTextSimple` profile to create PXF external tables to query the data:
+
+1. Create an HDFS directory for PXF example data files:
+
+    ``` shell
+    $ hdfs dfs -mkdir -p /data/pxf_examples
+    ```
+
+2. Create a delimited plain text data file named `pxf_hdfs_simple.txt`:
+
+    ``` shell
+    $ echo 'Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
+    ```
+
+    Note the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
+    ```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+    ``` shell
+    $ hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+    ```
+
+1. Use the `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_simple.txt` file you previously created and added to HDFS:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'TEXT' (delimiter=E',');
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
+    ```
+
+    ``` pre
+       location    | month | num_orders | total_sales 
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     Beijing       | Jul   |        411 |    11600.67
+    (4 rows)
+    ```
+
+2. Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'CSV';
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
+    ```
+
+    When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter option is required, as comma is the default.
+
+## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
+
+Use the `HdfsTextMulti` profile when reading plain text files with delimited single- or multi- line records that include embedded (quoted) linefeed characters.
+
+\<formatting-properties\> supported by the `HdfsTextMulti` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. |
+
+### <a id="profile_hdfstextmulti_query"></a>Example: Using the HdfsTextMulti Profile
+
+Perform the following steps to create a sample data file, copy the file to HDFS, and use the `HdfsTextMulti` profile to create a PXF external table to query the data:
+
+1. Create a second delimited plain text file:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_multi.txt
+    ```
+
+2. Copy/paste the following data into `pxf_hdfs_multi.txt`:
+
+    ``` pre
+    "4627 Star Rd.
+    San Francisco, CA  94107":Sept:2017
+    "113 Moon St.
+    San Diego, CA  92093":Jan:2018
+    "51 Belt Ct.
+    Denver, CO  90123":Dec:2016
+    "93114 Radial Rd.
+    Chicago, IL  60605":Jul:2017
+    "7301 Brookview Ave.
+    Columbus, OH  43213":Dec:2018
+    ```
+
+    Notice the use of the colon `:` to separate the three fields. Also notice the quotes around the first (address) field. This field includes an embedded line feed separating the street address from the city and state.
+
+3. Add the data file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_multi.txt /data/pxf_examples/
+    ```
+
+4. Use the `HdfsTextMulti` profile to create a queryable external table from the `pxf_hdfs_multi.txt` HDFS file, making sure to identify the `:` as the field separator:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
+              FORMAT 'CSV' (delimiter=E':');
+    ```
+    
+2. Query the `pxf_hdfs_textmulti` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
+    ```
+
+    ``` pre
+             address          | month | year 
+    --------------------------+-------+------
+     4627 Star Rd.            | Sept  | 2017
+     San Francisco, CA  94107           
+     113 Moon St.             | Jan   | 2018
+     San Diego, CA  92093               
+     51 Belt Ct.              | Dec   | 2016
+     Denver, CO  90123                  
+     93114 Radial Rd.         | Jul   | 2017
+     Chicago, IL  60605                 
+     7301 Brookview Ave.      | Dec   | 2018
+     Columbus, OH  43213                
+    (5 rows)
+    ```
+
+## <a id="profile_hdfsavro"></a>Avro Profile
+
+Apache Avro is a data serialization framework where the data is serialized in a compact binary format. 
+
+Avro specifies that data types be defined in JSON. Avro format files have an independent schema, also defined in JSON. An Avro schema, together with its data, is fully self-describing.
+
+### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
+
+Avro supports both primitive and complex data types. 
+
+To represent Avro primitive data types in HAWQ, map data values to HAWQ columns of the same type. 
+
+Avro supports complex data types including arrays, maps, records, enumerations, and fixed types. Map top-level fields of these complex data types to the HAWQ `TEXT` type. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract or further process subcomponents of these complex data types.
+
+The following table summarizes external mapping rules for Avro data.
+
+<a id="topic_oy3_qwm_ss__table_j4s_h1n_ss"></a>
+
+| Avro Data Type                                                    | PXF/HAWQ Data Type                                                                                                                                                                                            |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Primitive type (int, double, float, long, string, bytes, boolean) | Use the corresponding HAWQ built-in data type; see [Data Types](../reference/HAWQDataTypes.html). |
+| Complex type: Array, Map, Record, or Enum                         | TEXT, with delimiters inserted between collection items, mapped key-value pairs, and record data.                                                                                           |
+| Complex type: Fixed                                               | BYTEA                                                                                                                                                                                               |
+| Union                                                             | Follows the above conventions for primitive or complex data types, depending on the union; supports Null values.                                                                     |
+
+### <a id="profile_hdfsavroptipns"></a>Avro-Specific Custom Options
+
+For complex types, the PXF `Avro` profile inserts default delimiters between collection items and values. You can use non-default delimiter characters by identifying values for specific `Avro` custom options in the `CREATE EXTERNAL TABLE` call. 
+
+The `Avro` profile supports the following \<custom-options\>:
+
+| Option Name   | Description       
+|---------------|--------------------|                                                                                        
+| COLLECTION_DELIM | The delimiter character(s) to place between entries in a top-level array, map, or record field when PXF maps an Avro complex data type to a text column. The default is the comma `,` character. |
+| MAPKEY_DELIM | The delimiter character(s) to place between the key and value of a map entry when PXF maps an Avro complex data type to a text column. The default is the colon `:` character. |
+| RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text column. The default is the colon `:` character. |
+
+
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas and Data
+
+Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
+
+Fields in an Avro schema file are defined via an array of objects, each of which is specified by a name and a type.
+
+
+### <a id="topic_tr3_dpg_ts_example"></a>Example: Using the Avro Profile
+
+The examples in this section will operate on Avro data with the following record schema:
+
+- id - long
+- username - string
+- followers - array of string
+- fmap - map of long
+- address - record comprised of street number (int), street name (string), and city (string)
+- relationship - enumerated type
+
+
+#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts_99"></a>Create Schema
+
+Perform the following operations to create an Avro schema to represent the example schema described above.
+
+1. Create a file named `avro_schema.avsc`:
+
+    ``` shell
+    $ vi /tmp/avro_schema.avsc
+    ```
+
+2. Copy and paste the following text into `avro_schema.avsc`:
+
+    ``` json
+    {
+    "type" : "record",
+      "name" : "example_schema",
+      "namespace" : "com.example",
+      "fields" : [ {
+        "name" : "id",
+        "type" : "long",
+        "doc" : "Id of the user account"
+      }, {
+        "name" : "username",
+        "type" : "string",
+        "doc" : "Name of the user account"
+      }, {
+        "name" : "followers",
+        "type" : {"type": "array", "items": "string"},
+        "doc" : "Users followers"
+      }, {
+        "name": "fmap",
+        "type": {"type": "map", "values": "long"}
+      }, {
+        "name": "relationship",
+        "type": {
+            "type": "enum",
+            "name": "relationshipEnum",
+            "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
+        }
+      }, {
+        "name": "address",
+        "type": {
+            "type": "record",
+            "name": "addressRecord",
+            "fields": [
+                {"name":"number", "type":"int"},
+                {"name":"street", "type":"string"},
+                {"name":"city", "type":"string"}]
+        }
+      } ],
+      "doc:" : "A basic schema for storing messages"
+    }
+    ```
+
+#### <a id="topic_tr3_dpgspk_15g_tsdata"></a>Create Avro Data File (JSON)
+
+Perform the following steps to create a sample Avro data file conforming to the above schema.
+
+1.  Create a text file named `pxf_hdfs_avro.txt`:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_avro.txt
+    ```
+
+2. Enter the following data into `pxf_hdfs_avro.txt`:
+
+    ``` pre
+    {"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
+    
+    {"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}
+    ```
+
+    The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values.
+
+3. Convert the text file to Avro format. There are various ways to perform the conversion, both programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html); the jar file resides in the current directory:
+
+    ``` shell
+    $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
+    ```
+
+    The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. 
+    
+4. Copy the generated Avro file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
+    ```
+    
+#### <a id="topic_avro_querydata"></a>Query With Avro Profile
+
+Perform the following steps to create and query an external table accessing the `pxf_hdfs_avro.avro` file you added to HDFS in the previous section. When creating the table:
+
+-  Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent HAWQ types (bigint and text). 
+-  Map the remaining complex fields to type text.
+-  Explicitly set the record, map, and collection delimiters using the Avro profile custom options.
+
+
+1. Use the `Avro` profile to create a queryable external table from the `pxf_hdfs_avro.avro` file:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+    ```
+
+2. Perform a simple query of the `pxf_hdfs_avro` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_avro;
+    ```
+
+    ``` pre
+     id | username |   followers    |        fmap         | relationship |                      address                      
+    ----+----------+----------------+--------------------+--------------+---------------------------------------------------
+      1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | {number:1,street:renaissance drive,city:san jose}
+      2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | {number:9,street:deer creek,city:palo alto}
+    (2 rows)
+    ```
+
+    The simple query of the external table shows the components of the complex type data separated with the delimiters identified in the `CREATE EXTERNAL TABLE` call.
+
+
+3. Process the delimited components in the text columns as necessary for your application. For example, the following command uses the HAWQ internal `string_to_array` function to convert entries in the `followers` field to a text array column in a new view.
+
+    ``` sql
+    gpadmin=# CREATE VIEW followers_view AS 
+  SELECT username, address, string_to_array(substring(followers FROM 2 FOR (char_length(followers) - 2)), ',')::text[] 
+        AS followers 
+      FROM pxf_hdfs_avro;
+    ```
+
+4. Query the view to filter rows based on whether a particular follower appears in the array:
+
+    ``` sql
+    gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
+    ```
+
+    ``` pre
+     username |                   address                   
+    ----------+---------------------------------------------
+     jim      | {number:9,street:deer creek,city:palo alto}
+    ```
+
+## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability HDFS Cluster
+
+To access external HDFS data in a High Availability HDFS cluster, change the `CREATE EXTERNAL TABLE` `LOCATION` clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].
+
+``` sql
+gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+            LOCATION ('pxf://<HA-nameservice>/<path-to-hdfs-file>?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
+         FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
+```
+
+The opposite is true when a highly available HDFS cluster is reverted to a single NameNode configuration. In that case, any table definition that has specified \<HA-nameservice\> should use the \<host\>[:\<port\>] syntax. 
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
new file mode 100644
index 0000000..578d13f
--- /dev/null
+++ b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
@@ -0,0 +1,45 @@
+---
+title: Using PXF with Unmanaged Data
+---
+
+HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ to query external system data. 
+
+PXF includes built-in connectors for accessing data inside HDFS files, Hive tables, and HBase tables. PXF also integrates with HCatalog to query Hive tables directly.
+
+PXF allows users to create custom connectors to access other parallel data stores or processing engines. To create these connectors using Java plug-ins, see the [PXF External Tables and API](PXFExternalTableandAPIReference.html).
+
+-   **[Installing PXF Plug-ins](../pxf/InstallPXFPlugins.html)**
+
+    This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, and HBase. You should install the appropriate RPMs on each node in your cluster.
+
+-   **[Configuring PXF](../pxf/ConfigurePXF.html)**
+
+    This topic describes how to configure the PXF service.
+
+-   **[Accessing HDFS File Data](../pxf/HDFSFileDataPXF.html)**
+
+    This topic describes how to access HDFS file data using PXF.
+
+-   **[Accessing Hive Data](../pxf/HivePXF.html)**
+
+    This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog.
+
+-   **[Accessing HBase Data](../pxf/HBasePXF.html)**
+
+    This topic describes how to access HBase data using PXF.
+
+-   **[Accessing JSON Data](../pxf/JsonPXF.html)**
+
+    This topic describes how to access JSON data using PXF.
+
+-   **[Using Profiles to Read and Write Data](../pxf/ReadWritePXF.html)**
+
+    PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.
+
+-   **[PXF External Tables and API](../pxf/PXFExternalTableandAPIReference.html)**
+
+    You can use the PXF API to create your own connectors to access any other type of parallel data store or processing engine.
+
+-   **[Troubleshooting PXF](../pxf/TroubleshootingPXF.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HivePXF.html.md.erb b/markdown/pxf/HivePXF.html.md.erb
new file mode 100644
index 0000000..199c7a1
--- /dev/null
+++ b/markdown/pxf/HivePXF.html.md.erb
@@ -0,0 +1,700 @@
+---
+title: Accessing Hive Data
+---
+
+Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
+
+## <a id="installingthepxfhiveplugin"></a>Prerequisites
+
+Before accessing Hive data with HAWQ and PXF, ensure that:
+
+-   The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   The PXF Hive plug-in is installed on all cluster nodes.
+-   The Hive JAR files and conf directory are installed on all cluster nodes.
+-   You have tested PXF on HDFS.
+-   You are running the Hive Metastore service on a machine in your cluster. 
+-   You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode.
+
+## <a id="topic_p2s_lvl_25"></a>Hive File Formats
+
+The PXF Hive plug-in supports several file formats and profiles for accessing these formats:
+
+| File Format  | Description | Profile |
+|-------|---------------------------|-------|
+| TextFile | Flat file with data in comma-, tab-, or space-separated value format or JSON notation. | Hive, HiveText |
+| SequenceFile | Flat file consisting of binary key/value pairs. | Hive |
+| RCFile | Record columnar data consisting of binary key/value pairs; high row compression rate. | Hive, HiveRC |
+| ORCFile | Optimized row columnar data with stripe, footer, and postscript sections; reduces data size. | Hive |
+| Parquet | Compressed columnar data representation. | Hive |
+| Avro | JSON-defined, schema-based data serialization format. | Hive |
+
+Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive.
+
+## <a id="topic_p2s_lvl_29"></a>Data Type Mapping
+
+### <a id="hive_primdatatypes"></a>Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive types.
+
+| Hive Data Type  | Hawq Data Type |
+|-------|---------------------------|
+| boolean    | bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| timestamp   | timestamp |
+
+
+### <a id="topic_b4v_g3n_25"></a>Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`.  While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## <a id="hive_sampledataset"></a>Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types:
+
+| Field Name  | Data Type |
+|-------|---------------------------|
+| location | text |
+| month | text |
+| number\_of\_orders | integer |
+| total\_sales | double |
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+    ```
+    $ vi /tmp/pxf_hive_datafile.txt
+    ```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values:
+
+    ```
+    Prague,Jan,101,4875.33
+    Rome,Mar,87,1557.39
+    Bangalore,May,317,8936.99
+    Beijing,Jul,411,11600.67
+    San Francisco,Sept,156,6846.34
+    Paris,Nov,159,7134.56
+    San Francisco,Jan,113,5397.89
+    Prague,Dec,333,9894.77
+    Bangalore,Jul,271,8320.55
+    Beijing,Dec,100,4248.41
+    ```
+
+Make note of the path to `pxf_hive_datafile.txt`; you will use it in later exercises.
+
+
+## <a id="hivecommandline"></a>Hive Command Line
+
+The Hive command line is a subsystem similar to that of `psql`. To start the Hive command line:
+
+``` shell
+$ HADOOP_USER_NAME=hdfs hive
+```
+
+The default Hive database is named `default`. 
+
+### <a id="hivecommandline_createdb"></a>Example: Create a Hive Database
+
+Create a Hive table to expose our sample data set.
+
+1. Create a Hive table named `sales_info` in the `default` database:
+
+    ``` sql
+    hive> CREATE TABLE sales_info (location string, month string,
+            number_of_orders int, total_sales double)
+            ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+            STORED AS textfile;
+    ```
+
+    Notice that:
+    - The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
+    - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
+
+2. Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` table you just created:
+
+    ``` sql
+    hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
+            INTO TABLE sales_info;
+    ```
+
+3. Perform a query on `sales_info` to verify that the data was loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM sales_info;
+    ```
+
+In examples later in this section, you will access the `sales_info` Hive table directly via PXF. You will also insert `sales_info` data into tables of other Hive file format types, and use PXF to access those directly as well.
+
+## <a id="topic_p2s_lvl_28"></a>Querying External Hive Data
+
+The PXF Hive plug-in supports several Hive-related profiles. These include `Hive`, `HiveText`, and `HiveRC`.
+
+Use the following syntax to create a HAWQ external table representing Hive data:
+
+``` sql
+CREATE EXTERNAL TABLE <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+LOCATION ('pxf://<host>[:<port>]/<hive-db-name>.<hive-table-name>
+    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=<delim>'])
+FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='<delim>')
+```
+
+Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described below.
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<host\>[:<port\>]    | The HDFS NameNode and port. |
+| \<hive-db-name\>    | The name of the Hive database. If omitted, defaults to the Hive database named `default`. |
+| \<hive-table-name\>    | The name of the Hive table. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
+| DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  \<delim\> must be a single ascii character or specified in hexadecimal representation. |
+| FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
+| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in '\<delim\>'. |
+
+
+## <a id="profile_hive"></a>Hive Profile
+
+The `Hive` profile works with any Hive file format. It can access heterogenous format data in a single table where each partition may be stored as a different file format.
+
+While you can use the `Hive` profile to access any file format, the more specific profiles perform better for those single file format types.
+
+
+### <a id="profile_hive_using"></a>Example: Using the Hive Profile
+
+Use the `Hive` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier.
+
+1. Create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=Hive')
+              FORMAT 'custom' (formatter='pxfwritable_import');
+    ```
+
+2. Query the table:
+
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hiveprofile;
+    ```
+
+    ``` shell
+       location    | month | num_orders | total_sales
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     ...
+
+    ```
+
+## <a id="profile_hivetext"></a>HiveText Profile
+
+Use the `HiveText` profile to query text format files. The `HiveText` profile is more performant than the `Hive` profile.
+
+**Note**: When using the `HiveText` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
+
+### <a id="profile_hivetext_using"></a>Example: Using the HiveText Profile
+
+Use the PXF `HiveText` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier.
+
+1. Create the external table:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
+                 LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
+
+    (You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+
+    Notice that:
+    - The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
+    - The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `','`. `E` escapes the character.
+
+2. Query the external table:
+
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hivetextprofile WHERE location="Beijing";
+    ```
+
+    ``` shell
+     location | month | num_orders | total_sales
+    ----------+-------+------------+-------------
+     Beijing  | Jul   |        411 |    11600.67
+     Beijing  | Dec   |        100 |     4248.41
+    (2 rows)
+    ```
+
+## <a id="profile_hiverc"></a>HiveRC Profile
+
+The RCFile Hive format is used for row columnar formatted data. The `HiveRC` profile provides access to RCFile data.
+
+### <a id="profile_hiverc_rcfiletbl_using"></a>Example: Using the HiveRC Profile
+
+Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile for this file format type.
+
+1. Create a Hive table with RCFile format:
+
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
+
+    ``` sql
+    hive> CREATE TABLE sales_info_rcfile (location string, month string,
+            number_of_orders int, total_sales double)
+          ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+          STORED AS rcfile;
+    ```
+
+2. Insert the data from the `sales_info` table into `sales_info_rcfile`:
+
+    ``` sql
+    hive> INSERT INTO TABLE sales_info_rcfile SELECT * FROM sales_info;
+    ```
+
+    A copy of the sample data set is now stored in RCFile format in `sales_info_rcfile`. 
+    
+3. Perform a Hive query on `sales_info_rcfile` to verify that the data was loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM sales_info_rcfile;
+    ```
+
+4. Use the PXF `HiveRC` profile to create a queryable HAWQ external table from the Hive `sales_info_rcfile` table created in the previous step. When using the `HiveRC` profile, you **must** specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month text, num_orders int, total_sales float8)
+                 LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
+
+    (Again, you can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+
+5. Query the external table:
+
+    ``` sql
+    postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
+    ```
+
+    ``` shell
+       location    | total_sales
+    ---------------+-------------
+     Prague        |     4875.33
+     Rome          |     1557.39
+     Bangalore     |     8936.99
+     Beijing       |    11600.67
+     ...
+    ```
+
+## <a id="topic_dbb_nz3_ts"></a>Accessing Parquet-Format Hive Tables
+
+The PXF `Hive` profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hive table is created using:
+
+``` sql
+hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double)
+        STORED AS parquet;
+```
+
+Define the HAWQ external table using:
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision)
+    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive')
+    FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+```
+
+And query the HAWQ external table using:
+
+``` sql
+postgres=# SELECT fname,lname FROM pxf_parquet_table;
+```
+
+
+## <a id="profileperf"></a>Profile Performance Considerations
+
+The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
+
+
+## <a id="complex_dt_example"></a>Complex Data Type Example
+
+This example will employ the array and map complex types, specifically an array of integers and a string key/value pair map.
+
+The data schema for this example includes fields with the following names and data types:
+
+| Field Name  | Data Type |
+|-------|---------------------------|
+| index | int |
+| name | string
+| intarray | array of integers |
+| propmap | map of string key and value pairs |
+
+When specifying an array field in a Hive table, you must identify the terminator for each item in the collection. Similarly, the map key termination character must also be specified.
+
+1. Create a text file from which you will load the data set:
+
+    ```
+    $ vi /tmp/pxf_hive_complex.txt
+    ```
+
+2. Add the following data to `pxf_hive_complex.txt`.  The data uses a comma `,` to separate field values, the percent symbol `%` to separate collection items, and a `:` to terminate map key values:
+
+    ```
+    3,Prague,1%2%3,zone:euro%status:up
+    89,Rome,4%5%6,zone:euro
+    400,Bangalore,7%8%9,zone:apac%status:pending
+    183,Beijing,0%1%2,zone:apac
+    94,Sacramento,3%4%5,zone:noam%status:down
+    101,Paris,6%7%8,zone:euro%status:up
+    56,Frankfurt,9%0%1,zone:euro
+    202,Jakarta,2%3%4,zone:apac%status:up
+    313,Sydney,5%6%7,zone:apac%status:pending
+    76,Atlanta,8%9%0,zone:noam%status:down
+    ```
+
+3. Create a Hive table to represent this data:
+
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
+
+    ``` sql
+    hive> CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>)
+             ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+             COLLECTION ITEMS TERMINATED BY '%'
+             MAP KEYS TERMINATED BY ':'
+             STORED AS TEXTFILE;
+    ```
+
+    Notice that:
+    - `FIELDS TERMINATED BY` identifies a comma as the field terminator.
+    - The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
+    - `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map keys.
+
+4. Load the `pxf_hive_complex.txt` sample data file into the `table_complextypes` table you just created:
+
+    ``` sql
+    hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
+    ```
+
+5. Perform a query on Hive table `table_complextypes` to verify that the data was loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM table_complextypes;
+    ```
+
+    ``` shell
+    3	Prague	[1,2,3]	{"zone":"euro","status":"up"}
+    89	Rome	[4,5,6]	{"zone":"euro"}
+    400	Bangalore	[7,8,9]	{"zone":"apac","status":"pending"}
+    ...
+    ```
+
+6. Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive `table_complextypes`:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
+                 LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive')
+               FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+    ```
+
+    Notice that the integer array and map complex types are mapped to type text.
+
+7. Query the external table:
+
+    ``` sql
+    postgres=# SELECT * FROM complextypes_hiveprofile;
+    ```
+
+    ``` shell     
+     index |    name    | intarray |              propmap
+    -------+------------+----------+------------------------------------
+         3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
+        89 | Rome       | [4,5,6]  | {"zone":"euro"}
+       400 | Bangalore  | [7,8,9]  | {"zone":"apac","status":"pending"}
+       183 | Beijing    | [0,1,2]  | {"zone":"apac"}
+        94 | Sacramento | [3,4,5]  | {"zone":"noam","status":"down"}
+       101 | Paris      | [6,7,8]  | {"zone":"euro","status":"up"}
+        56 | Frankfurt  | [9,0,1]  | {"zone":"euro"}
+       202 | Jakarta    | [2,3,4]  | {"zone":"apac","status":"up"}
+       313 | Sydney     | [5,6,7]  | {"zone":"apac","status":"pending"}
+        76 | Atlanta    | [8,9,0]  | {"zone":"noam","status":"down"}
+    (10 rows)
+    ```
+
+    `intarray` and `propmap` are each text strings.
+
+## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
+
+Hive tables can be queried directly through HCatalog integration with HAWQ and PXF, regardless of the underlying file storage format.
+
+In previous sections, you created an external table in PXF that described the target table's Hive metadata. Another option for querying Hive tables is to take advantage of HAWQ's integration with HCatalog. This integration allows HAWQ to directly use table metadata stored in HCatalog.
+
+HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
+
+-   You do not need to know the table schema of your Hive tables
+-   You do not need to manually enter information about Hive table location or format
+-   If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast to the use of static external PXF tables to define Hive table metadata for HAWQ.
+
+The following diagram depicts how HAWQ integrates with HCatalog to query Hive tables:
+
+<img src="../images/hawq_hcatalog.png" id="hcatalog__image_ukw_h2v_c5" class="image" width="672" />
+
+1.  HAWQ retrieves table metadata from HCatalog using PXF.
+2.  HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog.
+3.  PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.
+
+
+### <a id="topic_j1l_enabling"></a>Enabling HCatalog Integration
+
+To enable HCatalog query integration in HAWQ, perform the following steps:
+
+1.  Make sure your deployment meets the requirements listed in [Prerequisites](#installingthepxfhiveplugin).
+2.  If necessary, set the `pxf_service_address` global configuration property to the hostname or IP address and port where you have installed the PXF Hive plug-in. By default, the value is set to `localhost:51200`.
+
+    ``` sql
+    postgres=# SET pxf_service_address TO <hivenode>:51200
+    ```
+
+3.  HCatalog internally uses the `pxf` protocol to query.  Grant this protocol privilege to all roles requiring access:
+
+    ``` sql
+    postgres=# GRANT ALL ON PROTOCOL pxf TO <role>;
+    ```
+
+4. It is not recommended to create a HAWQ table using the `WITH (OIDS)` clause. If any user tables were created using the `WITH (OIDS)` clause, additional operations are required to enable HCatalog integration. To access a Hive table via HCatalog when user tables were created using `WITH (OIDS)`, HAWQ users must have `SELECT` permission to query every user table within the same schema that was created using the `WITH (OIDS)` clause. 
+
+    1. Determine which user tables were created using the `WITH (OIDS)` clause:
+
+        ``` sql
+        postgres=# SELECT oid, relname FROM pg_class 
+                     WHERE relhasoids = true 
+                       AND relnamespace <> (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog');
+        ```
+
+    2. Grant `SELECT` privilege on all returned tables to all roles to which you chose to provide HCatalog query access. For example:
+
+        ``` sql
+        postgres=# GRANT SELECT ON <table-created-WITH-OIDS> TO <role>
+        ``` 
+
+### <a id="topic_j1l_y55_c5"></a>Usage    
+
+To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
+
+For example:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
+
+To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface.
+
+-   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name` or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands to describe a single table.  `\d` displays only HAWQ's interpretation of the underlying source (Hive in this case) data type, while `\d+` displays both the HAWQ interpreted and Hive source data types. For example, from the `psql` client interface:
+
+    ``` shell
+    $ psql -d postgres
+    ```
+
+    ``` sql
+    postgres=# \d+ hcatalog.default.sales_info_rcfile;
+    ```
+
+    ``` shell
+    PXF Hive Table "default.sales_info_rcfile"
+          Column      |  Type  | Source type 
+    ------------------+--------+-------------
+     location         | text   | string
+     month            | text   | string
+     number_of_orders | int4   | int
+     total_sales      | float8 | double
+    ```
+-   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all tables in `hive-db-name`.
+-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables.
+
+When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
+
+Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only supported input profile at this time is `'Hive'`.
+
+- The following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype.
+
+    ``` sql
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info_rcfile');
+    ```
+
+    ``` pre
+      path   |     itemname      |    fieldname     | fieldtype
+    ---------+-------------------+------------------+-----------
+     default | sales_info_rcfile | location         | text
+     default | sales_info_rcfile | month            | text
+     default | sales_info_rcfile | number_of_orders | int4
+     default | sales_info_rcfile | total_sales      | float8
+    ```
+
+- The following statement returns table descriptions from the default database.
+
+    ``` sql
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
+    ```
+
+- The following statement returns a description of the entire schema.
+
+    ``` sql
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
+    ```
+
+### <a id="topic_r5k_pst_25"></a>Limitations
+
+HCatalog integration has the following limitations:
+
+-   HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive. (See [Complex Types Example](#complex_dt_example).)
+-   Even for primitive types, HCatalog metadata descriptions produced by `\d` are HAWQ's interpretation of the underlying Hive data types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Type Mapping](#hive_primdatatypes).)
+-   HAWQ reserves the database name `hcatalog` for system use. You cannot connect to or alter the system `hcatalog` database.
+
+## <a id="partitionfiltering"></a>Partition Filtering
+
+The PXF Hive plug-in supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising the Hive table. To use the partition filtering feature to reduce network traffic and I/O, run a PXF query using a `WHERE` clause that refers to a specific partition in the partitioned Hive table.
+
+To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the HAWQ side, impacting performance.
+
+**Note:** The Hive plug-in filters only on partition columns, not on other table attributes.
+
+### <a id="partitionfiltering_pushdowncfg"></a>Configure Partition Filtering Push-Down
+
+PXF partition filtering push-down is enabled by default. To disable PXF partition filtering push-down, set the `pxf_enable_filter_pushdown` HAWQ server configuration parameter to `off`:
+
+``` sql
+postgres=# SHOW pxf_enable_filter_pushdown;
+ pxf_enable_filter_pushdown
+-----------------------------
+ on
+(1 row)
+postgres=# SET pxf_enable_filter_pushdown=off;
+```
+
+### <a id="example2"></a>Create Partitioned Hive Table
+
+Create a Hive table `sales_part` with two partition columns, `delivery_state` and `delivery_city:`
+
+``` sql
+hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price double)
+        PARTITIONED BY (delivery_state string, delivery_city string)
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
+```
+
+Load data into this Hive table and add some partitions:
+
+``` sql
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') 
+        VALUES ('block', 'widget', 33, 15.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') 
+        VALUES ('cube', 'widget', 11, 1.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') 
+        VALUES ('dowel', 'widget', 51, 31.82);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') 
+        VALUES ('px49', 'pipe', 52, 99.82);
+```
+
+The Hive storage directory structure for the `sales_part` table appears as follows:
+
+``` pre
+$ sudo -u hdfs hdfs dfs -ls -R /apps/hive/warehouse/sales_part
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=’Fresno’/
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=Sacramento/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=Reno/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=’Las Vegas’/
+```
+
+To define a HAWQ PXF table that will read this Hive table and take advantage of partition filter push-down, define the fields corresponding to the Hive partition fields at the end of the `CREATE EXTERNAL TABLE` attribute list. In HiveQL, a `SELECT *` statement on a partitioned table shows the partition fields at the end of the record.
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
+  item_name TEXT, 
+  item_type TEXT, 
+  supplier_key INTEGER, 
+  item_price DOUBLE PRECISION, 
+  delivery_state TEXT, 
+  delivery_city TEXT
+)
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
+FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+
+postgres=# SELECT * FROM pxf_sales_part;
+```
+
+### <a id="example3"></a>Query Without Pushdown
+
+In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on  `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing.
+
+``` sql
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube';
+```
+
+### <a id="example4"></a>Query With Pushdown
+
+The following HAWQ query reads all the data under `delivery_state` partition `CALIFORNIA`, regardless of the city.
+
+``` sql
+postgres=# SET pxf_enable_filter_pushdown=on;
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_state = 'CALIFORNIA';
+```
+
+## <a id="topic_fdm_zrh_1s"></a>Using PXF with Hive Default Partitions
+
+This topic describes a difference in query results between Hive and PXF queries when Hive tables use a default partition. When dynamic partitioning is enabled in Hive, a partitioned table may store data in a default partition. Hive creates a default partition when the value of a partitioning column does not match the defined type of the column (for example, when a NULL value is used for any partitioning column). In Hive, any query that includes a filter on a partition column *excludes* any data that is stored in the table's default partition.
+
+Similar to Hive, PXF represents a table's partitioning columns as columns that are appended to the end of the table. However, PXF translates any column value in a default partition to a NULL value. This means that a HAWQ query that includes an IS NULL filter on a partitioning column can return different results than the same Hive query.
+
+Consider a Hive partitioned table that is created with the statement:
+
+``` sql
+hive> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (xdate date);
+```
+
+The table is loaded with five rows that contain the following data:
+
+``` pre
+1.0    1900-01-01
+2.2    1994-04-14
+3.3    2011-03-31
+4.5    NULL
+5.0    2013-12-06
+```
+
+The insertion of row 4 creates a Hive default partition, because the partition column `xdate` contains a null value.
+
+In Hive, any query that filters on the partition column omits data in the default partition. For example, the following query returns no rows:
+
+``` sql
+hive> SELECT * FROM sales WHERE xdate is null;
+```
+
+However, if you map this table as a PXF external table in HAWQ, all default partition values are translated into actual NULL values. In HAWQ, executing the same query against the PXF table returns row 4 as the result, because the filter matches the NULL value.
+
+Keep this behavior in mind when executing IS NULL queries on Hive partitioned tables.
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/InstallPXFPlugins.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/InstallPXFPlugins.html.md.erb b/markdown/pxf/InstallPXFPlugins.html.md.erb
new file mode 100644
index 0000000..4ae4101
--- /dev/null
+++ b/markdown/pxf/InstallPXFPlugins.html.md.erb
@@ -0,0 +1,81 @@
+---
+title: Installing PXF Plug-ins
+---
+
+This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, HBase, and JSON. 
+
+**Note:** PXF requires that you run Tomcat on the host machine. Tomcat reserves ports 8005, 8080, and 8009. If you have configured Oozie JXM reporting on a host that will run PXF, make sure that the reporting service uses a port other than 8005. This helps to prevent port conflict errors from occurring when you start the PXF service.
+
+## <a id="directories_and_logs"></a>PXF Installation and Log File Directories
+
+Installing PXF plug-ins, regardless of method, creates directories and log files on each node receiving the plug-in installation:
+
+| Directory                      | Description                                                                                                                                                                                                                                |
+|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `/usr/lib/pxf`                 | PXF library location                                                                                                                                                                                                                       |
+| `/etc/pxf/conf`                | PXF configuration directory. This directory contains the `pxf-public.classpath` and `pxf-private.classpath` configuration files. See [Setting up the Java Classpath](ConfigurePXF.html#settingupthejavaclasspath). |
+| `/var/pxf/pxf-service`         | PXF service instance location                                                                                                                                                                                                              |
+| `/var/log/pxf` | This directory includes `pxf-service.log` and all Tomcat-related logs including `catalina.out`. Logs are owned by user:group `pxf`:`pxf`. Other users have read access.                                                                          |
+| `/var/run/pxf/catalina.pid`    | PXF Tomcat container PID location                                                                                                                                                                                                          |
+
+
+## <a id="install_pxf_plug_ambari"></a>Installing PXF Using Ambari
+
+If you are using Ambari to install and manage your HAWQ cluster, you do *not* need to follow the manual installation steps in this topic. Installing using the Ambari web interface installs all of the necessary PXF plug-in components.
+
+## <a id="install_pxf_plug_cmdline"></a>Installing PXF from the Command Line
+
+Each PXF service plug-in resides in its own RPM.  You may have built these RPMs in the Apache HAWQ open source project repository (see [PXF Build Instructions](https://github.com/apache/incubator-hawq/blob/master/pxf/README.md)), or these RPMs may have been included in a commercial product download package.
+
+Perform the following steps on **_each_** node in your cluster to install PXF:
+
+1. Install the PXF software, including Apache, the PXF service, and all PXF plug-ins: HDFS, HBase, Hive, JSON:
+
+    ```shell
+    $ sudo yum install -y pxf
+    ```
+
+    Installing PXF in this manner:
+    * installs the required version of `apache-tomcat`
+    * creates a `/etc/pxf/pxf-n.n.n` directory, adding a softlink from `/etc/pxf` to this directory
+    * sets up the PXF service configuration files in `/etc/pxf`
+    * creates a `/usr/lib/pxf-n.n.n` directory, adding a softlink from `/usr/lib/pxf` to this directory
+    * copies the PXF service JAR file `pxf-service-n.n.n.jar` to `/usr/lib/pxf-n.n.n/`
+    * copies JAR files for each of the PXF plugs-ins to `/usr/lib/pxf-n.n.n/`
+    * creates softlinks from `pxf-xxx.jar` in `/usr/lib/pxf-n.n.n/`
+
+2. Initialize the PXF service:
+
+    ```shell
+    $ sudo service pxf-service init
+    ```
+
+2. Start the PXF service:
+
+    ```shell
+    $ sudo service pxf-service start
+    ```
+    
+    Additional `pxf-service` command options include `stop`, `restart`, and `status`.
+
+2. If you choose to use the HBase plug-in, perform the following configuration:
+
+    1. Add the PXF HBase plug-in JAR file to the HBase `CLASSPATH` by updating the `HBASE_CLASSPATH` environment variable setting in the HBase environment file `/etc/hbase/conf/hbase-env.sh`:
+
+        ``` shell
+        export HBASE_CLASSPATH=${HBASE_CLASSPATH}:/usr/lib/pxf/pxf-hbase.jar
+        ```
+
+    3. Restart the HBase service after making this update to HBase configuration.
+
+        On the HBase Master node:
+
+        ``` shell
+        $ su -l hbase -c "/usr/hdp/current/hbase-master/bin/hbase-daemon.sh restart master; sleep 25"
+       ```
+
+        On an HBase Region Server node:
+
+        ```shell
+        $ su -l hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh restart regionserver"
+        ```


Mime
View raw message