hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [04/57] [abbrv] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs
Date Tue, 10 Jan 2017 23:53:55 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/pxf/JsonPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/JsonPXF.html.md.erb b/pxf/JsonPXF.html.md.erb
deleted file mode 100644
index 97195ad..0000000
--- a/pxf/JsonPXF.html.md.erb
+++ /dev/null
@@ -1,197 +0,0 @@
----
-title: Accessing JSON File Data
----
-
-The PXF JSON plug-in reads native JSON stored in HDFS.  The plug-in supports common data types, as well as basic (N-level) projection and arrays.
-
-To access JSON file data with HAWQ, the data must be stored in HDFS and an external table created from the HDFS data store.
-
-## Prerequisites<a id="jsonplugprereq"></a>
-
-Before working with JSON file data using HAWQ and PXF, ensure that:
-
--   The PXF HDFS plug-in is installed on all cluster nodes.
--   The PXF JSON plug-in is installed on all cluster nodes.
--   You have tested PXF on HDFS.
-
-
-## Working with JSON Files<a id="topic_workwjson"></a>
-
-JSON is a text-based data-interchange format.  JSON data is typically stored in a file with a `.json` suffix. A `.json` file will contain a collection of objects.  A JSON object is a collection of unordered name/value pairs.  A value can be a string, a number, true, false, null, or an object or array. Objects and arrays can be nested.
-
-Refer to [Introducing JSON](http://www.json.org/) for specific information on JSON syntax.
-
-Sample JSON data file content:
-
-``` json
-  {
-    "created_at":"MonSep3004:04:53+00002013",
-    "id_str":"384529256681725952",
-    "user": {
-      "id":31424214,
-       "location":"COLUMBUS"
-    },
-    "coordinates":null
-  }
-```
-
-### JSON to HAWQ Data Type Mapping<a id="topic_workwjson"></a>
-
-To represent JSON data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. JSON supports complex data types including projections and arrays. Use N-level projection to map members of nested objects and arrays to primitive data types.
-
-The following table summarizes external mapping rules for JSON data.
-
-<caption><span class="tablecap">Table 1. JSON Mapping</span></caption>
-
-<a id="topic_table_jsondatamap"></a>
-
-| JSON Data Type                                                    | HAWQ Data Type                                                                                                                                                                                            |
-|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Primitive type (integer, float, string, boolean, null) | Use the corresponding HAWQ built-in data type; see [Data Types](../reference/HAWQDataTypes.html). |
-| Array                         | Use `[]` brackets to identify a specific array index to a member of primitive type.                                                                                            |
-| Object                | Use dot `.` notation to specify each level of projection (nesting) to a member of a primitive type.                                                                                         |
-
-
-### JSON File Read Modes<a id="topic_jsonreadmodes"></a>
-
-
-The PXF JSON plug-in reads data in one of two modes. The default mode expects one full JSON record per line.  The JSON plug-in also supports a read mode operating on multi-line JSON records.
-
-In the following discussion, a data set defined by a sample schema will be represented using each read mode of the PXF JSON plug-in.  The sample schema contains data fields with the following names and data types:
-
-   - "created_at" - text
-   - "id_str" - text
-   - "user" - object
-      - "id" - integer
-      - "location" - text
-   - "coordinates" - object (optional)
-      - "type" - text
-      - "values" - array
-         - [0] - integer
-         - [1] - integer
-
-
-Example 1 - Data Set for Single-JSON-Record-Per-Line Read Mode:
-
-``` pre
-{"created_at":"FriJun0722:45:03+00002013","id_str":"343136551322136576","user":{
-"id":395504494,"location":"NearCornwall"},"coordinates":{"type":"Point","values"
-: [ 6, 50 ]}},
-{"created_at":"FriJun0722:45:02+00002013","id_str":"343136547115253761","user":{
-"id":26643566,"location":"Austin,Texas"}, "coordinates": null},
-{"created_at":"FriJun0722:45:02+00002013","id_str":"343136547136233472","user":{
-"id":287819058,"location":""}, "coordinates": null}
-```  
-
-Example 2 - Data Set for Multi-Line JSON Record Read Mode:
-
-``` json
-{
-  "root":[
-    {
-      "record_obj":{
-        "created_at":"MonSep3004:04:53+00002013",
-        "id_str":"384529256681725952",
-        "user":{
-          "id":31424214,
-          "location":"COLUMBUS"
-        },
-        "coordinates":null
-      },
-      "record_obj":{
-        "created_at":"MonSep3004:04:54+00002013",
-        "id_str":"384529260872228864",
-        "user":{
-          "id":67600981,
-          "location":"KryberWorld"
-        },
-        "coordinates":{
-          "type":"Point",
-          "values":[
-             8,
-             52
-          ]
-        }
-      }
-    }
-  ]
-}
-```
-
-## Loading JSON Data to HDFS<a id="jsontohdfs"></a>
-
-The PXF JSON plug-in reads native JSON stored in HDFS. Before JSON data can be queried via HAWQ, it must first be loaded to an HDFS data store.
-
-Copy and paste the single line JSON record data set to a file named `singleline.json`.  Similarly, copy and paste the multi-line JSON record data set to `multiline.json`.
-
-**Note**:  Ensure there are **no** blank lines in your JSON files.
-
-Add the data set files to the HDFS data store:
-
-``` shell
-$ hdfs dfs -mkdir /user/data
-$ hdfs dfs -put singleline.json /user/data
-$ hdfs dfs -put multiline.json /user/data
-```
-
-Once loaded to HDFS, JSON data may be queried and analyzed via HAWQ.
-
-## Querying External JSON Data<a id="jsoncetsyntax1"></a>
-
-Use the following syntax to create an external table representing JSON data: 
-
-``` sql
-CREATE EXTERNAL TABLE <table_name> 
-    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
-LOCATION ( 'pxf://<host>[:<port>]/<path-to-data>?PROFILE=Json[&IDENTIFIER=<value>]' )
-      FORMAT 'CUSTOM' ( FORMATTER='pxfwritable_import' );
-```
-JSON-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` call are described below.
-
-| Keyword  | Value |
-|-------|-------------------------------------|
-| \<host\>    | Specify the HDFS NameNode in the \<host\> field. |
-| PROFILE    | The `PROFILE` keyword must specify the value `Json`. |
-| IDENTIFIER  | Include the `IDENTIFIER` keyword and \<value\> in the `LOCATION` string only when accessing a JSON file with multi-line records. \<value\> should identify the member name used to determine the encapsulating JSON object to return.  (If the JSON file is the multi-line record Example 2 above, `&IDENTIFIER=created_at` would be specified.) |  
-| FORMAT    | The `FORMAT` clause must specify `CUSTOM`. |
-| FORMATTER    | The JSON `CUSTOM` format supports only the built-in `pxfwritable_import` `FORMATTER`. |
-
-
-### Example 1 <a id="jsonexample1"></a>
-
-The following [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) SQL call creates a queryable external table based on the data in the single-line-per-record JSON example.
-
-``` sql 
-CREATE EXTERNAL TABLE sample_json_singleline_tbl(
-  created_at TEXT,
-  id_str TEXT,
-  text TEXT,
-  "user.id" INTEGER,
-  "user.location" TEXT,
-  "coordinates.values[0]" INTEGER,
-  "coordinates.values[1]" INTEGER
-)
-LOCATION('pxf://namenode:51200/user/data/singleline.json?PROFILE=Json')
-FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-SELECT * FROM sample_json_singleline_tbl;
-```
-
-Notice the use of `.` projection to access the nested fields in the `user` and `coordinates` objects.  Also notice the use of `[]` to access the specific elements of the `coordinates.values` array.
-
-### Example 2 <a id="jsonexample2"></a>
-
-A `CREATE EXTERNAL TABLE` SQL call to create a queryable external table based on the multi-line-per-record JSON data set would be very similar to that of the single line data set above. You might specify a different database name, `sample_json_multiline_tbl` for example. 
-
-The `LOCATION` clause would differ.  The `IDENTIFIER` keyword and an associated value must be specified when reading from multi-line JSON records:
-
-``` sql
-LOCATION('pxf://namenode:51200/user/data/multiline.json?PROFILE=Json&IDENTIFIER=created_at')
-```
-
-`created_at` identifies the member name used to determine the encapsulating JSON object, `record_obj` in this case.
-
-To query this external table populated with JSON data:
-
-``` sql
-SELECT * FROM sample_json_multiline_tbl;
-```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/pxf/PXFExternalTableandAPIReference.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/PXFExternalTableandAPIReference.html.md.erb b/pxf/PXFExternalTableandAPIReference.html.md.erb
deleted file mode 100644
index 292616b..0000000
--- a/pxf/PXFExternalTableandAPIReference.html.md.erb
+++ /dev/null
@@ -1,1311 +0,0 @@
----
-title: PXF External Tables and API
----
-
-You can use the PXF API to create your own connectors to access any other type of parallel data store or processing engine.
-
-The PXF Java API lets you extend PXF functionality and add new services and formats without changing HAWQ. The API includes three classes that are extended to allow HAWQ to access an external data source: Fragmenter, Accessor, and Resolver.
-
-The Fragmenter produces a list of data fragments that can be read in parallel from the data source. The Accessor produces a list of records from a single fragment, and the Resolver both deserializes and serializes records.
-
-Together, the Fragmenter, Accessor, and Resolver classes implement a connector. PXF includes plug-ins for tables in HDFS, HBase, and Hive.
-
-## <a id="creatinganexternaltable"></a>Creating an External Table
-
-The syntax for a readable `EXTERNAL TABLE` that uses the PXF protocol is as follows:
-
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
-        ( column_name data_type [, ...] | LIKE other_table )
-LOCATION('pxf://host[:port]/path-to-data<pxf parameters>[&custom-option=value...]')
-FORMAT 'custom' (formatter='pxfwritable_import|pxfwritable_export');
-```
-
- where *&lt;pxf parameters&gt;* is:
-
-``` pre
-   ?FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | ?PROFILE=profile-name
-```
-<caption><span class="tablecap">Table 1. Parameter values and description</span></caption>
-
-<a id="creatinganexternaltable__table_pfy_htz_4p"></a>
-
-| Parameter               | Value and description                                                                                                                                                                                                                                                          |
-|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| host                    | The current host of the PXF service.                                                                                                                                                                                                                                           |
-| port                    | Connection port for the PXF service. If the port is omitted, PXF assumes that High Availability (HA) is enabled and connects to the HA name service port, 51200 by default. The HA name service port can be changed by setting the `pxf_service_port` configuration parameter. |
-| *path\_to\_data*        | A directory, file name, wildcard pattern, table name, etc.                                                                                                                                                                                                                     |
-| FRAGMENTER              | The plug-in (Java class) to use for fragmenting data. Used for READABLE external tables only.                                                                                                                                                                                   |
-| ACCESSOR                | The plug-in (Java class) to use for accessing the data. Used for READABLE and WRITABLE tables.                                                                                                                                                                                  |
-| RESOLVER                | The plug-in (Java class) to use for serializing and deserializing the data. Used for READABLE and WRITABLE tables.                                                                                                                                                              |
-| *custom-option*=*value* | Additional values to pass to the plug-in class. The parameters are passed at runtime to the plug-ins indicated above. The plug-ins can lookup custom options with `org.apache.hawq.pxf.api.utilities.InputData`.                                                                  |
-
-**Note:** When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
-
-For more information about this example, see [About the Java Class Services and Formats](#aboutthejavaclassservicesandformats).
-
-## <a id="aboutthejavaclassservicesandformats"></a>About the Java Class Services and Formats
-
-The `LOCATION` string in a PXF `CREATE EXTERNAL TABLE` statement is a URI that specifies the host and port of an external data source and the path to the data in the external data source. The query portion of the URI, introduced by the question mark (?), must include the required parameters `FRAGMENTER` (readable tables only), `ACCESSOR`, and `RESOLVER`, which specify Java class names that extend the base PXF API plug-in classes. Alternatively, the required parameters can be replaced with a `PROFILE` parameter with the name of a profile defined in the `/etc/conf/pxf-profiles.xml` that defines the required classes.
-
-The parameters in the PXF URI are passed from HAWQ as headers to the PXF Java service. You can pass custom information to user-implemented PXF plug-ins by adding optional parameters to the LOCATION string.
-
-The Java PXF service retrieves the source data from the external data source and converts it to a HAWQ-readable table format.
-
-The Accessor, Resolver, and Fragmenter Java classes extend the `org.apache.hawq.pxf.api.utilities.Plugin` class:
-
-``` java
-package org.apache.hawq.pxf.api.utilities;
-/**
- * Base class for all plug-in types (Accessor, Resolver, Fragmenter, ...).
- * Manages the meta data.
- */
-public class Plugin {
-    protected InputData inputData;
-    /**
-     * Constructs a plug-in.
-     *
-     * @param input the input data
-     */
-    public Plugin(InputData input) {
-        this.inputData = input;
-    }
-    /**
-     * Checks if the plug-in is thread safe or not, based on inputData.
-     *
-     * @return true if plug-in is thread safe
-     */
-    public boolean isThreadSafe() {
-        return true;
-    }
-}
-```
-
-The parameters in the `LOCATION` string are available to the plug-ins through methods in the `org.apache.hawq.pxf.api.utilities.InputData` class. Custom parameters added to the location string can be looked up with the `getUserProperty()` method.
-
-``` java
-/**
- * Common configuration available to all PXF plug-ins. Represents input data
- * coming from client applications, such as HAWQ.
- */
-public class InputData {
-
-    /**
-     * Constructs an InputData from a copy.
-     * Used to create from an extending class.
-     *
-     * @param copy the input data to copy
-     */
-    public InputData(InputData copy);
-
-    /**
-     * Returns value of a user defined property.
-     *
-     * @param userProp the lookup user property
-     * @return property value as a String
-     */
-    public String getUserProperty(String userProp);
-
-    /**
-     * Sets the byte serialization of a fragment meta data
-     * @param location start, len, and location of the fragment
-     */
-    public void setFragmentMetadata(byte[] location);
-
-    /** Returns the byte serialization of a data fragment */
-    public byte[] getFragmentMetadata();
-
-    /**
-     * Gets any custom user data that may have been passed from the
-     * fragmenter. Will mostly be used by the accessor or resolver.
-     */
-    public byte[] getFragmentUserData();
-
-    /**
-     * Sets any custom user data that needs to be shared across plug-ins.
-     * Will mostly be set by the fragmenter.
-     */
-    public void setFragmentUserData(byte[] userData);
-
-    /** Returns the number of segments in GP. */
-    public int getTotalSegments();
-
-    /** Returns the current segment ID. */
-    public int getSegmentId();
-
-    /** Returns true if there is a filter string to parse. */
-    public boolean hasFilter();
-
-    /** Returns the filter string, <tt>null</tt> if #hasFilter is <tt>false</tt> */
-    public String getFilterString();
-
-    /** Returns tuple description. */
-    public ArrayList<ColumnDescriptor> getTupleDescription();
-
-    /** Returns the number of columns in tuple description. */
-    public int getColumns();
-
-    /** Returns column index from tuple description. */
-    public ColumnDescriptor getColumn(int index);
-
-    /**
-     * Returns the column descriptor of the recordkey column. If the recordkey
-     * column was not specified by the user in the create table statement will
-     * return null.
-     */
-    public ColumnDescriptor getRecordkeyColumn();
-
-    /** Returns the data source of the required resource (i.e a file path or a table name). */
-    public String getDataSource();
-
-    /** Sets the data source for the required resource */
-    public void setDataSource(String dataSource);
-
-    /** Returns the ClassName for the java class that was defined as Accessor */
-    public String getAccessor();
-
-    /** Returns the ClassName for the java class that was defined as Resolver */
-    public String getResolver();
-
-    /**
-     * Returns the ClassName for the java class that was defined as Fragmenter
-     * or null if no fragmenter was defined
-     */
-    public String getFragmenter();
-
-    /**
-     * Returns the contents of pxf_remote_service_login set in Hawq.
-     * Should the user set it to an empty string this function will return null.
-     *
-     * @return remote login details if set, null otherwise
-     */
-    public String getLogin();
-
-    /**
-     * Returns the contents of pxf_remote_service_secret set in Hawq.
-     * Should the user set it to an empty string this function will return null.
-     *
-     * @return remote password if set, null otherwise
-     */
-    public String getSecret();
-
-    /**
-     * Returns true if the request is thread safe. Default true. Should be set
-     * by a user to false if the request contains non thread-safe plug-ins or
-     * components, such as BZip2 codec.
-     */
-    public boolean isThreadSafe();
-
-    /**
-     * Returns a data fragment index. plan to deprecate it in favor of using
-     * getFragmentMetadata().
-     */
-    public int getDataFragment();
-}
-```
-
--   **[Fragmenter](../pxf/PXFExternalTableandAPIReference.html#fragmenter)**
-
--   **[Accessor](../pxf/PXFExternalTableandAPIReference.html#accessor)**
-
--   **[Resolver](../pxf/PXFExternalTableandAPIReference.html#resolver)**
-
-### <a id="fragmenter"></a>Fragmenter
-
-**Note:** The Fragmenter Plugin reads data into HAWQ readable external tables. The Fragmenter Plugin cannot write data out of HAWQ into writable external tables.
-
-The Fragmenter is responsible for passing datasource metadata back to HAWQ. It also returns a list of data fragments to the Accessor or Resolver. Each data fragment describes some part of the requested data set. It contains the datasource name, such as the file or table name, including the hostname where it is located. For example, if the source is a HDFS file, the Fragmenter returns a list of data fragments containing a HDFS file block. Each fragment includes the location of the block. If the source data is an HBase table, the Fragmenter returns information about table regions, including their locations.
-
-The `ANALYZE` command now retrieves advanced statistics for PXF readable tables by estimating the number of tuples in a table, creating a sample table from the external table, and running advanced statistics queries on the sample table in the same way statistics are collected for native HAWQ tables.
-
-The configuration parameter `pxf_enable_stat_collection` controls collection of advanced statistics. If `pxf_enable_stat_collection` is set to false, no analysis is performed on PXF tables. An additional parameter, `pxf_stat_max_fragments`, controls the number of fragments sampled to build a sample table. By default `pxf_stat_max_fragments` is set to 100, which means that even if there are more than 100 fragments, only this number of fragments will be used in `ANALYZE` to sample the data. Increasing this number will result in better sampling, but can also impact performance.
-
-When a PXF table is analyzed and `pxf_enable_stat_collection` is set to off, or an error occurs because the table is not defined correctly, the PXF service is down, or `getFragmentsStats` is not implemented, a warning message is shown and no statistics are gathered for that table. If `ANALYZE` is running over all tables in the database, the next table will be processed – a failure processing one table does not stop the command.
-
-For a detailed explanation about HAWQ statistical data gathering, see `ANALYZE` in the SQL Commands Reference.
-
-**Note:**
-
--   Depending on external table size, the time required to complete an ANALYZE operation can be lengthy. The boolean parameter `pxf_enable_stat_collection` enables statistics collection for PXF. The default value is `on`. Turning this parameter off (disabling PXF statistics collection) can help decrease the time needed for the ANALYZE operation.
--   You can also use *pxf\_stat\_max\_fragments* to limit the number of fragments to be sampled by decreasing it from the default (100). However, if the number is too low, the sample might not be uniform and the statistics might be skewed.
--   You can also implement getFragmentsStats to return an error. This will cause ANALYZE on a table with this Fragmenter to fail immediately, and default statistics values will be used for that table.
-
-The following table lists the Fragmenter plug-in implementations included with the PXF API.
-
-<a id="fragmenter__table_cgs_svp_3s"></a>
-
-<table>
-<caption><span class="tablecap">Table 2. Fragmenter base classes </span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th><p><code class="ph codeph">Fragmenter class</code></p></th>
-<th><p><code class="ph codeph">Description</code></p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</td>
-<td>Fragmenter for Hdfs files</td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hbase.HBaseAtomicDataAccessor</td>
-<td>Fragmenter for HBase tables</td>
-</tr>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hive.HiveDataFragmenter</td>
-<td>Fragmenter for Hive tables </td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hdfs.HiveInputFormatFragmenter</td>
-<td>Fragmenter for Hive tables with RC or text files </td>
-</tr>
-</tbody>
-</table>
-
-A Fragmenter class extends `org.apache.hawq.pxf.api.Fragmenter`:
-
-#### <a id="com.pivotal.pxf.api.fragmenter"></a>org.apache.hawq.pxf.api.Fragmenter
-
-``` java
-package org.apache.hawq.pxf.api;
-/**
- * Abstract class that defines the splitting of a data resource into fragments
- * that can be processed in parallel.
- */
-public abstract class Fragmenter extends Plugin {
-        protected List<Fragment> fragments;
-
-    public Fragmenter(InputData metaData) {
-        super(metaData);
-        fragments = new LinkedList<Fragment>();
-    }
-
-       /**
-        * Gets the fragments of a given path (source name and location of each
-        * fragment). Used to get fragments of data that could be read in parallel
-        * from the different segments.
-        */
-    public abstract List<Fragment> getFragments() throws Exception;
-
-    /**
-        * Default implementation of statistics for fragments. The default is:
-        * <ul>
-        * <li>number of fragments - as gathered by {@link #getFragments()}</li>
-        * <li>first fragment size - 64MB</li>
-        * <li>total size - number of fragments times first fragment size</li>
-        * </ul>
-        * Each fragmenter implementation can override this method to better match
-        * its fragments stats.
-        *
-        * @return default statistics
-        * @throws Exception if statistics cannot be gathered
-        */
-       public FragmentsStats getFragmentsStats() throws Exception {
-        List<Fragment> fragments = getFragments();
-        long fragmentsNumber = fragments.size();
-        return new FragmentsStats(fragmentsNumber,
-                FragmentsStats.DEFAULT_FRAGMENT_SIZE, fragmentsNumber
-                        * FragmentsStats.DEFAULT_FRAGMENT_SIZE);
-    }
-}
-  
-```
-
-`getFragments()` returns a string in JSON format of the retrieved fragment. For example, if the input path is a HDFS directory, the source name for each fragment should include the file name including the path for the fragment.
-
-#### <a id="classdescription"></a>Class Description
-
-The Fragmenter.getFragments() method returns a List&lt;Fragment&gt;;:
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Fragment holds a data fragment' information.
- * Fragmenter.getFragments() returns a list of fragments.
- */
-public class Fragment
-{
-    private String sourceName;    // File path+name, table name, etc.
-    private int index;            // Fragment index (incremented per sourceName)
-    private String[] replicas;    // Fragment replicas (1 or more)
-    private byte[]   metadata;    // Fragment metadata information (starting point + length, region location, etc.)
-    private byte[]   userData;    // ThirdParty data added to a fragment. Ignored if null
-    ...
-}
-```
-
-#### <a id="topic_fzd_tlv_c5"></a>org.apache.hawq.pxf.api.FragmentsStats
-
-The `Fragmenter.getFragmentsStats()` method returns a `FragmentsStats`:
-
-``` java
-package org.apache.hawq.pxf.api;
-/**
- * FragmentsStats holds statistics for a given path.
- */
-public class FragmentsStats {
-
-    // number of fragments
-    private long fragmentsNumber;
-    // first fragment size
-    private SizeAndUnit firstFragmentSize;
-    // total fragments size
-    private SizeAndUnit totalSize;
-
-   /**
-     * Enum to represent unit (Bytes/KB/MB/GB/TB)
-     */
-    public enum SizeUnit {
-        /**
-         * Byte
-         */
-        B,
-        /**
-         * KB
-         */
-        KB,
-        /**
-         * MB
-         */
-        MB,
-        /**
-         * GB
-         */
-        GB,
-        /**
-         * TB
-         */
-        TB;
-    };
-
-    /**
-     * Container for size and unit
-     */
-    public class SizeAndUnit {
-        long size;
-        SizeUnit unit;
-    ... 
-
-```
-
-`getFragmentsStats()` returns a string in JSON format of statistics for the data source. For example, if the input path is a HDFS directory of 3 files, each one of 1 block, the output will be the number of fragments (3), the size of the first file, and the size of all files in that directory.
-
-### <a id="accessor"></a>Accessor
-
-The Accessor retrieves specific fragments and passes records back to the Resolver. For example, the HDFS plug-ins create a `org.apache.hadoop.mapred.FileInputFormat` and a `org.apache.hadoop.mapred.RecordReader` for an HDFS file and sends this to the Resolver. In the case of HBase or Hive files, the Accessor returns single rows from an HBase or Hive table. PXF 1.x or higher contains the following Accessor implementations:
-
-<a id="accessor__table_ewm_ttz_4p"></a>
-
-<table>
-<caption><span class="tablecap">Table 3. Accessor base classes </span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th><p><code class="ph codeph">Accessor class</code></p></th>
-<th><p><code class="ph codeph">Description</code></p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hdfs.HdfsAtomicDataAccessor</td>
-<td>Base class for accessing datasources which cannot be split. These will be accessed by a single HAWQ segment</td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</td>
-<td>Accessor for TEXT files that have records with embedded linebreaks</td>
-</tr>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hdfs.HdfsSplittableDataAccessor</td>
-<td><p>Base class for accessing HDFS files using <code class="ph codeph">RecordReaders</code></p></td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</td>
-<td>Accessor for TEXT files (replaced the deprecated <code class="ph codeph">TextFileAccessor</code>, <code class="ph codeph">LineReaderAccessor</code>)</td>
-</tr>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hdfs.AvroFileAccessor</td>
-<td>Accessor for Avro files</td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor</td>
-<td>Accessor for Sequence files</td>
-</tr>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hbase.HBaseAccessor </td>
-<td>Accessor for HBase tables </td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hive.HiveAccessor</td>
-<td>Accessor for Hive tables </td>
-</tr>
-<tr class="odd">
-<td>org.apache.hawq.pxf.plugins.hive.HiveLineBreakAccessor</td>
-<td>Accessor for Hive tables with text files</td>
-</tr>
-<tr class="even">
-<td>org.apache.hawq.pxf.plugins.hive.HiveRCFileAccessor</td>
-<td>Accessor for Hive tables with RC files</td>
-</tr>
-</tbody>
-</table>
-
-The class must extend the `org.apache.hawq.pxf.Plugin`  class, and implement one or both interfaces:
-
--   `org.apache.hawq.pxf.api.ReadAccessor`
--   `org.apache.hawq.pxf.api.WriteAccessor`
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Internal interface that defines the access to data on the source
- * data store (e.g, a file on HDFS, a region of an HBase table, etc).
- * All classes that implement actual access to such data sources must
- * respect this interface
- */
-public interface ReadAccessor {
-    boolean openForRead() throws Exception;
-    OneRow readNextObject() throws Exception;
-    void closeForRead() throws Exception;
-}
-```
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * An interface for writing data into a data store
- * (e.g, a sequence file on HDFS).
- * All classes that implement actual access to such data sources must
- * respect this interface
- */
-public interface WriteAccessor {
-    boolean openForWrite() throws Exception;
-    OneRow writeNextObject(OneRow onerow) throws Exception;
-    void closeForWrite() throws Exception;
-}
-```
-
-The Accessor calls `openForRead()` to read existing data. After reading the data, it calls `closeForRead()`. `readNextObject()` returns one of the following:
-
--   a single record, encapsulated in a OneRow object
--   null if it reaches `EOF`
-
-The Accessor calls `openForWrite()` to write data out. After writing the data, it writes a `OneRow` object with `writeNextObject()`, and when done calls `closeForWrite()`. `OneRow` represents a key-value item.
-
-#### <a id="com.pivotal.pxf.api.onerow"></a>org.apache.hawq.pxf.api.OneRow
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Represents one row in the external system data store. Supports
- * the general case where one row contains both a record and a
- * separate key like in the HDFS key/value model for MapReduce
- * (Example: HDFS sequence file)
- */
-public class OneRow {
-    /*
-     * Default constructor
-     */
-    public OneRow();
-
-    /*
-     * Constructor sets key and data
-     */
-    public OneRow(Object inKey, Object inData);
-
-    /*
-     * Setter for key
-     */
-    public void setKey(Object inKey);
-    
-    /*
-     * Setter for data
-     */
-    public void setData(Object inData);
-
-    /*
-     * Accessor for key
-     */
-    public Object getKey();
-
-    /*
-     * Accessor for data
-     */
-    public Object getData();
-
-    /*
-     * Show content
-     */
-    public String toString();
-}
-```
-
-### <a id="resolver"></a>Resolver
-
-The Resolver deserializes records in the `OneRow` format and serializes them to a list of `OneField` objects. PXF converts a `OneField` object to a HAWQ-readable `GPDBWritable` format. PXF 1.x or higher contains the following implementations:
-
-<a id="resolver__table_nbd_d5z_4p"></a>
-
-<table>
-<caption><span class="tablecap">Table 4. Resolver base classes</span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th><p><code class="ph codeph">Resolver class</code></p></th>
-<th><p><code class="ph codeph">Description</code></p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</code></p></td>
-<td><p><code class="ph codeph">StringPassResolver</code> replaced the deprecated <code class="ph codeph">TextResolver</code>. It passes whole records (composed of any data types) as strings without parsing them</p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hdfs.WritableResolver</code></p></td>
-<td><p>Resolver for custom Hadoop Writable implementations. Custom class can be specified with the schema in DATA-SCHEMA. Supports the following types:</p>
-<pre class="pre codeblock"><code>DataType.BOOLEAN
-DataType.INTEGER
-DataType.BIGINT
-DataType.REAL
-DataType.FLOAT8
-DataType.VARCHAR
-DataType.BYTEA</code></pre></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hdfs.AvroResolver</code></p></td>
-<td><p>Supports the same field objects as <code class="ph codeph">WritableResolver</code>. </p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hbase.HBaseResolver</code></p></td>
-<td><p>Supports the same field objects as <code class="ph codeph">WritableResolver</code> and also supports the following:</p>
-<pre class="pre codeblock"><code>DataType.SMALLINT
-DataType.NUMERIC
-DataType.TEXT
-DataType.BPCHAR
-DataType.TIMESTAMP</code></pre></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hive.HiveResolver</code></p></td>
-<td><p>Supports the same field objects as <code class="ph codeph">WritableResolver</code> and also supports the following:</p>
-<pre class="pre codeblock"><code>DataType.SMALLINT
-DataType.TEXT
-DataType.TIMESTAMP</code></pre></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">org.apache.hawq.pxf.plugins.hive.HiveStringPassResolver</code></p></td>
-<td>Specialized <code class="ph codeph">HiveResolver</code> for a Hive table stored as Text files. Should be used together with <code class="ph codeph">HiveInputFormatFragmenter</code>/<code class="ph codeph">HiveLineBreakAccessor</code>.</td>
-</tr>
-<tr class="odd">
-<td><code class="ph codeph">org.apache.hawq.pxf.plugins.hive.HiveColumnarSerdeResolver</code></td>
-<td>Specialized <code class="ph codeph">HiveResolver</code> for a Hive table stored as RC file. Should be used together with <code class="ph codeph">HiveInputFormatFragmenter</code>/<code class="ph codeph">HiveRCFileAccessor</code>.</td>
-</tr>
-</tbody>
-</table>
-
-The class needs to extend the `org.apache.hawq.pxf.resolvers.Plugin class                `, and implement one or both interfaces:
-
--   `org.apache.hawq.pxf.api.ReadResolver`
--   `org.apache.hawq.pxf.api.WriteResolver`
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Interface that defines the deserialization of one record brought from
- * the data Accessor. Every implementation of a deserialization method
- * (e.g, Writable, Avro, ...) must implement this interface.
- */
-public interface ReadResolver {
-    public List<OneField> getFields(OneRow row) throws Exception;
-}
-```
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
-* Interface that defines the serialization of data read from the DB
-* into a OneRow object.
-* Every implementation of a serialization method
-* (e.g, Writable, Avro, ...) must implement this interface.
-*/
-public interface WriteResolver {
-    public OneRow setFields(List<OneField> record) throws Exception;
-}
-```
-
-**Note:**
-
--   getFields should return a List&lt;OneField&gt;, each OneField representing a single field.
--   `setFields `should return a single `OneRow `object, given a List&lt;OneField&gt;.
-
-#### <a id="com.pivotal.pxf.api.onefield"></a>org.apache.hawq.pxf.api.OneField
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Defines one field on a deserialized record.
- * 'type' is in OID values recognized by GPDBWritable
- * 'val' is the actual field value
- */
-public class OneField {
-    public OneField() {}
-    public OneField(int type, Object val) {
-        this.type = type;
-        this.val = val;
-    }
-
-    public int type;
-    public Object val;
-}
-```
-
-The value of `type` should follow the org.apache.hawq.pxf.api.io.DataType `enums`. `val` is the appropriate Java class. Supported types are as follows:
-
-<a id="com.pivotal.pxf.api.onefield__table_f4x_35z_4p"></a>
-
-<table>
-<caption><span class="tablecap">Table 5. Resolver supported types</span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th><p>DataType recognized OID</p></th>
-<th><p>Field value</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.SMALLINT</code></p></td>
-<td><p><code class="ph codeph">Short</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.INTEGER</code></p></td>
-<td><p><code class="ph codeph">Integer</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.BIGINT</code></p></td>
-<td><p><code class="ph codeph">Long</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.REAL</code></p></td>
-<td><p><code class="ph codeph">Float</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.FLOAT8</code></p></td>
-<td><p><code class="ph codeph">Double</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.NUMERIC</code></p></td>
-<td><p><code class="ph codeph">String (&quot;651687465135468432168421&quot;)</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.BOOLEAN</code></p></td>
-<td><p><code class="ph codeph">Boolean</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.VARCHAR</code></p></td>
-<td><p><code class="ph codeph">String</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.BPCHAR</code></p></td>
-<td><p><code class="ph codeph">String</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.TEXT</code></p></td>
-<td><p><code class="ph codeph">String</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.BYTEA</code></p></td>
-<td><p><code class="ph codeph">byte []</code></p></td>
-</tr>
-<tr class="even">
-<td><p><code class="ph codeph">DataType.TIMESTAMP</code></p></td>
-<td><p><code class="ph codeph">Timestamp</code></p></td>
-</tr>
-<tr class="odd">
-<td><p><code class="ph codeph">DataType.Date</code></p></td>
-<td><p><code class="ph codeph">Date</code></p></td>
-</tr>
-</tbody>
-</table>
-
-### <a id="analyzer"></a>Analyzer
-
-The Analyzer has been deprecated. A new function in the Fragmenter API (Fragmenter.getFragmentsStats) is used to gather initial statistics for the data source, and provides PXF statistical data for the HAWQ query optimizer. For a detailed explanation about HAWQ statistical data gathering, see `ANALYZE` in the SQL Command Reference.
-
-Using the Analyzer API will result in an error message. Use the Fragmenter and getFragmentsStats to gather advanced statistics.
-
-## <a id="aboutcustomprofiles"></a>About Custom Profiles
-
-Administrators can add new profiles or edit the built-in profiles in `/etc/conf/pxf-profiles.xml` file. See [Using Profiles to Read and Write Data](ReadWritePXF.html#readingandwritingdatawithpxf) for information on how to add custom profiles.
-
-## <a id="aboutqueryfilterpush-down"></a>About Query Filter Push-Down
-
-If a query includes a number of WHERE clause filters,  HAWQ may push all or some queries to PXF. If pushed to PXF, the Accessor can use the filtering information when accessing the data source to fetch tuples. These filters only return records that pass filter evaluation conditions. This reduces data processing and reduces network traffic from the SQL engine.
-
-This topic includes the following information:
-
--   Filter Availability and Ordering 
--   Creating a Filter Builder class
--   Filter Operations
--   Sample Implementation
--   Using Filters
-
-### <a id="filteravailabilityandordering"></a>Filter Availability and Ordering
-
-PXF allows push-down filtering if the following rules are met:
-
--   Uses only single expressions or a group of AND'ed expressions - no OR'ed expressions.
--   Uses only expressions of supported data types and operators.
-
-FilterParser scans the pushed down filter list and uses the user's build() implementation to build the filter.
-
--   For simple expressions (e.g, a &gt;= 5), FilterParser places column objects on the left of the expression and constants on the right.
--   For compound expressions (e.g &lt;expression&gt; AND &lt;expression&gt;) it handles three cases in the build() function:
-    1.  Simple Expression: &lt;Column Index&gt; &lt;Operation&gt; &lt;Constant&gt;
-    2.  Compound Expression: &lt;Filter Object&gt; AND &lt;Filter Object&gt;
-    3.  Compound Expression: &lt;List of Filter Objects&gt; AND &lt;Filter Object&gt;
-
-### <a id="creatingafilterbuilderclass"></a>Creating a Filter Builder Class
-
-To check if a filter queried PXF, call the `InputData                   hasFilter()` function:
-
-``` java
-/*
- * Returns true if there is a filter string to parse
- */
-public boolean hasFilter()
-{
-   return filterStringValid;
-}
-```
-
-If `hasFilter()` returns `false`, there is no filter information. If it returns `true`, PXF parses the serialized filter string into a meaningful filter object to use later. To do so, create a filter builder class that implements the `FilterParser.FilterBuilder ` interface:
-
-``` java
-package org.apache.hawq.pxf.api;
-/*
- * Interface a user of FilterParser should implement
- * This is used to let the user build filter expressions in the manner she 
- * sees fit
- *
- * When an operator is parsed, this function is called to let the user decide
- * what to do with its operands.
- */
-interface FilterBuilder {
-   public Object build(Operation operation, Object left, Object right) throws Exception;
-}
-```
-
-While PXF parses the serialized filter string from the incoming HAWQ query, it calls the `build() interface` function. PXF calls this function for each condition or filter pushed down to PXF. Implementing this function returns some Filter object or representation that the Fragmenter, Accessor, or Resolver uses in runtime to filter out records. The `build()` function accepts an Operation as input, and left and right operands.
-
-### <a id="filteroperations"></a>Filter Operations
-
-``` java
-/*
- * Operations supported by the parser
- */
-public enum Operation
-{
-    HDOP_LT, //less than
-    HDOP_GT, //greater than
-    HDOP_LE, //less than or equal
-    HDOP_GE, //greater than or equal
-    HDOP_EQ, //equal
-    HDOP_NE, //not equal
-    HDOP_AND //AND'ed conditions
-};
-```
-
-#### <a id="filteroperands"></a>Filter Operands
-
-There are three types of operands:
-
--   Column Index
--   Constant
--   Filter Object
-
-#### <a id="columnindex"></a>Column Index
-
-``` java
-/*
- * Represents a column index
- */
-public class ColumnIndex
-{
-   public ColumnIndex(int idx);
-
-   public int index();
-}
-```
-
-#### <a id="constant"></a>Constant
-
-``` java
-/*
- * The class represents a constant object (String, Long, ...)
- */
-public class Constant
-{
-    public Constant(Object obj);
-
-    public Object constant();
-}
-```
-
-#### <a id="filterobject"></a>Filter Object
-
-Filter Objects can be internal, such as those you define; or external, those that the remote system uses. For example, for HBase, you define the HBase `Filter` class (`org.apache.hadoop.hbase.filter.Filter`), while for Hive, you use an internal default representation created by the PXF framework, called `BasicFilter`. You can decide the filter object to use, including writing a new one. `BasicFilter` is the most common:
-
-``` java
-/*
- * Basic filter provided for cases where the target storage system does not provide its own filter
- * For example: Hbase storage provides its own filter but for a Writable based record in a SequenceFile
- * there is no filter provided and so we need to have a default
- */
-static public class BasicFilter
-{
-   /*
-    * C'tor
-    */
-   public BasicFilter(Operation inOper, ColumnIndex inColumn, Constant inConstant);
-
-   /*
-    * Returns oper field
-    */
-   public Operation getOperation();
-
-   /*
-    * Returns column field
-    */
-   public ColumnIndex getColumn();
-
-   /*
-    * Returns constant field
-    */
-   public Constant getConstant();
-}
-```
-
-### <a id="sampleimplementation"></a>Sample Implementation
-
-Let's look at the following sample implementation of the filter builder class and its `build()` function that handles all 3 cases. Let's assume that BasicFilter was used to hold our filter operations.
-
-``` java
-import java.util.LinkedList;
-import java.util.List;
-
-import org.apache.hawq.pxf.api.FilterParser;
-import org.apache.hawq.pxf.api.utilities.InputData;
-
-public class MyDemoFilterBuilder implements FilterParser.FilterBuilder
-{
-    private InputData inputData;
-
-    public MyDemoFilterBuilder(InputData input)
-    {
-        inputData = input;
-    }
-
-    /*
-     * Translates a filterString into a FilterParser.BasicFilter or a list of such filters
-     */
-    public Object getFilterObject(String filterString) throws Exception
-    {
-        FilterParser parser = new FilterParser(this);
-        Object result = parser.parse(filterString);
-
-        if (!(result instanceof FilterParser.BasicFilter) && !(result instanceof List))
-            throw new Exception("String " + filterString + " resolved to no filter");
-
-        return result;
-    }
- 
-    public Object build(FilterParser.Operation opId,
-                        Object leftOperand,
-                        Object rightOperand) throws Exception
-    {
-        if (leftOperand instanceof FilterParser.BasicFilter)
-        {
-            //sanity check
-            if (opId != FilterParser.Operation.HDOP_AND || !(rightOperand instanceof FilterParser.BasicFilter))
-                throw new Exception("Only AND is allowed between compound expressions");
-
-            //case 3
-            if (leftOperand instanceof List)
-                return handleCompoundOperations((List<FilterParser.BasicFilter>)leftOperand, (FilterParser.BasicFilter)rightOperand);
-            //case 2
-            else
-                return handleCompoundOperations((FilterParser.BasicFilter)leftOperand, (FilterParser.BasicFilter)rightOperand);
-        }
-
-        //sanity check
-        if (!(rightOperand instanceof FilterParser.Constant))
-            throw new Exception("expressions of column-op-column are not supported");
-
-        //case 1 (assume column is on the left)
-        return handleSimpleOperations(opId, (FilterParser.ColumnIndex)leftOperand, (FilterParser.Constant)rightOperand);
-    }
-
-    private FilterParser.BasicFilter handleSimpleOperations(FilterParser.Operation opId,
-                                                            FilterParser.ColumnIndex column,
-                                                            FilterParser.Constant constant)
-    {
-        return new FilterParser.BasicFilter(opId, column, constant);
-    }
-
-    private  List handleCompoundOperations(List<FilterParser.BasicFilter> left,
-                                       FilterParser.BasicFilter right)
-    {
-        left.add(right);
-        return left;
-    }
-
-    private List handleCompoundOperations(FilterParser.BasicFilter left,
-                                          FilterParser.BasicFilter right)
-    {
-        List<FilterParser.BasicFilter> result = new LinkedList<FilterParser.BasicFilter>();
-
-        result.add(left);
-        result.add(right);
-        return result;
-    }
-}
-```
-
-Here is an example of creating a filter-builder class to implement the Filter interface, implement the `build()` function, and generate the Filter object. To do this, use either the Accessor, Resolver, or both to call the `getFilterObject` function:
-
-``` java
-if (inputData.hasFilter())
-{
-    String filterStr = inputData.filterString();
-    MyDemoFilterBuilder demobuilder = new MyDemoFilterBuilder(inputData);
-    Object filter = demobuilder.getFilterObject(filterStr);
-    ...
-}
-```
-
-### <a id="usingfilters"></a>Using Filters
-
-Once you have built the Filter object(s), you can use them to read data and filter out records that do not meet the filter conditions:
-
-1.  Check whether you have a single or multiple filters.
-2.  Evaluate each filter and iterate over each filter in the list. Disqualify the record if filter conditions fail.
-
-``` java
-if (filter instanceof List)
-{
-    for (Object f : (List)filter)
-        <evaluate f>; //may want to break if evaluation results in negative answer for any filter.
-}
-else
-{
-    <evaluate filter>;
-}
-```
-
-Example of evaluating a single filter:
-
-``` java
-//Get our BasicFilter Object
-FilterParser.BasicFilter bFilter = (FilterParser.BasicFilter)filter;
-
- 
-//Get operation and operator values
-FilterParser.Operation op = bFilter.getOperation();
-int colIdx = bFilter.getColumn().index();
-String val = bFilter.getConstant().constant().toString();
-
-//Get more info about the column if desired
-ColumnDescriptor col = input.getColumn(colIdx);
-String colName = filterColumn.columnName();
- 
-//Now evaluate it against the actual column value in the record...
-```
-
-## <a id="reference"></a>Examples
-
-This section contains the following information:
-
--   [External Table Examples](#externaltableexamples)
--   [Plug-in Examples](#pluginexamples)
-
--   **[External Table Examples](../pxf/PXFExternalTableandAPIReference.html#externaltableexamples)**
-
--   **[Plug-in Examples](../pxf/PXFExternalTableandAPIReference.html#pluginexamples)**
-
-### <a id="externaltableexamples"></a>External Table Examples
-
-#### <a id="example1"></a>Example 1
-
-Shows an external table that can analyze all `Sequencefiles` that are populated `Writable` serialized records and exist inside the hdfs directory `sales/2012/01`. `SaleItem.class` is a Java class that implements the `Writable` interface and describes a Java record that includes three class members.
-
-**Note:** In this example, the class member names do not necessarily match the database attribute names, but the types match. `SaleItem.class` must exist in the classpath of every DataNode and NameNode.
-
-``` sql
-CREATE EXTERNAL TABLE jan_2012_sales (id int, total int, comments varchar)
-LOCATION ('pxf://10.76.72.26:51200/sales/2012/01/*.seq'
-          '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
-          '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
-          '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
-          '&DATA-SCHEMA=SaleItem')
-FORMAT 'custom' (formatter='pxfwritable_import');
-```
-
-#### <a id="example2"></a>Example 2
-
-Example 2 shows an external table that can analyze an HBase table called `sales`. It has 10 column families `(cf1 – cf10)` and many qualifier names in each family. This example focuses on the `rowkey`, the qualifier `saleid` inside column family `cf1`, and the qualifier `comments` inside column family `cf8` and uses direct mapping:
-
-``` sql
-CREATE EXTERNAL TABLE hbase_sales
-  (hbaserowkey text, "cf1:saleid" int, "cf8:comments" varchar)
-LOCATION ('pxf://10.76.72.26:51200/sales?PROFILE=HBase')
-FORMAT 'custom' (formatter='pxfwritable_import');
-```
-
-#### <a id="example3"></a>Example 3
-
-This example uses indirect mapping. Note how the attribute name changes and how they correspond to the HBase lookup table. Executing `SELECT FROM                      my_hbase_sales`, the attribute names automatically convert to their HBase correspondents.
-
-``` sql
-CREATE EXTERNAL TABLE my_hbase_sales (hbaserowkey text, id int, cmts varchar)
-LOCATION
-('pxf://10.76.72.26:51200/sales?PROFILE=HBase')
-FORMAT 'custom' (formatter='pxfwritable_import');
-```
-
-#### <a id="example4"></a>Example 4
-
-Shows an example for a writable table of compressed data. 
-
-``` sql
-CREATE WRITABLE EXTERNAL TABLE sales_aggregated_2012
-    (id int, total int, comments varchar)
-LOCATION ('pxf://10.76.72.26:51200/sales/2012/aggregated'
-          '?PROFILE=HdfsTextSimple'
-          '&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
-FORMAT 'TEXT';
-```
-
-#### <a id="example5"></a>Example 5
-
-Shows an example for a writable table into a sequence file, using a schema file. For writable tables, the formatter is `pxfwritable_export`.
-
-``` sql
-CREATE WRITABLE EXTERNAL TABLE sales_max_2012
-    (id int, total int, comments varchar)
-LOCATION ('pxf://10.76.72.26:51200/sales/2012/max'
-          '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
-          '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
-          '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
-          '&DATA-SCHEMA=SaleItem')
-FORMAT 'custom' (formatter='pxfwritable_export');
-```
-
-### <a id="pluginexamples"></a>Plug-in Examples
-
-This section contains sample dummy implementations of all three plug-ins. It also contains a usage example.
-
-#### <a id="dummyfragmenter"></a>Dummy Fragmenter
-
-``` java
-import org.apache.hawq.pxf.api.Fragmenter;
-import org.apache.hawq.pxf.api.Fragment;
-import org.apache.hawq.pxf.api.utilities.InputData;
-import java.util.List;
-
-/*
- * Class that defines the splitting of a data resource into fragments that can
- * be processed in parallel
- * getFragments() returns the fragments information of a given path (source name and location of each fragment).
- * Used to get fragments of data that could be read in parallel from the different segments.
- * Dummy implementation, for documentation
- */
-public class DummyFragmenter extends Fragmenter {
-    public DummyFragmenter(InputData metaData) {
-        super(metaData);
-    }
-    /*
-     * path is a data source URI that can appear as a file name, a directory name or a wildcard
-     * returns the data fragments - identifiers of data and a list of available hosts
-     */
-    @Override
-    public List<Fragment> getFragments() throws Exception {
-        String localhostname = java.net.InetAddress.getLocalHost().getHostName();
-        String[] localHosts = new String[]{localhostname, localhostname};
-        fragments.add(new Fragment(inputData.getDataSource() + ".1" /* source name */,
-                localHosts /* available hosts list */,
-                "fragment1".getBytes()));
-        fragments.add(new Fragment(inputData.getDataSource() + ".2" /* source name */,
-                localHosts /* available hosts list */,
-                "fragment2".getBytes()));
-        fragments.add(new Fragment(inputData.getDataSource() + ".3" /* source name */,
-                localHosts /* available hosts list */,
-                "fragment3".getBytes()));
-        return fragments;
-    }
-}
-```
-
-#### <a id="dummyaccessor"></a>Dummy Accessor
-
-``` java
-import org.apache.hawq.pxf.api.WriteAccessor;
-import org.apache.hawq.pxf.api.OneRow;
-import org.apache.hawq.pxf.api.utilities.InputData;
-import org.apache.hawq.pxf.api.utilities.Plugin;
-import org.apache.commons.logging.Log;
-import org.apache.commons.logging.LogFactory;
-
-/*
- * Internal interface that defines the access to a file on HDFS.  All classes
- * that implement actual access to an HDFS file (sequence file, avro file,...)
- * must respect this interface
- * Dummy implementation, for documentation
- */
-public class DummyAccessor extends Plugin implements ReadAccessor, WriteAccessor {
-    private static final Log LOG = LogFactory.getLog(DummyAccessor.class);
-    private int rowNumber;
-    private int fragmentNumber;
-    public DummyAccessor(InputData metaData) {
-        super(metaData);
-    }
-    @Override
-    public boolean openForRead() throws Exception {
-        /* fopen or similar */
-        return true;
-    }
-    @Override
-    public OneRow readNextObject() throws Exception {
-        /* return next row , <key=fragmentNo.rowNo, val=rowNo,text,fragmentNo>*/
-        /* check for EOF */
-        if (fragmentNumber > 0)
-            return null; /* signal EOF, close will be called */
-        int fragment = inputData.getDataFragment();
-        String fragmentMetadata = new String(inputData.getFragmentMetadata());
-        /* generate row */
-        OneRow row = new OneRow(fragment + "." + rowNumber, /* key */
-                rowNumber + "," + fragmentMetadata + "," + fragment /* value */);
-        /* advance */
-        rowNumber += 1;
-        if (rowNumber == 2) {
-            rowNumber = 0;
-            fragmentNumber += 1;
-        }
-        /* return data */
-        return row;
-    }
-    @Override
-    public void closeForRead() throws Exception {
-        /* fclose or similar */
-    }
-    @Override
-    public boolean openForWrite() throws Exception {
-        /* fopen or similar */
-        return true;
-    }
-    @Override
-    public boolean writeNextObject(OneRow onerow) throws Exception {
-        LOG.info(onerow.getData());
-        return true;
-    }
-    @Override
-    public void closeForWrite() throws Exception {
-        /* fclose or similar */
-    }
-}
-```
-
-#### <a id="dummyresolver"></a>Dummy Resolver
-
-``` java
-import org.apache.hawq.pxf.api.OneField;
-import org.apache.hawq.pxf.api.OneRow;
-import org.apache.hawq.pxf.api.ReadResolver;
-import org.apache.hawq.pxf.api.WriteResolver;
-import org.apache.hawq.pxf.api.utilities.InputData;
-import org.apache.hawq.pxf.api.utilities.Plugin;
-import java.util.LinkedList;
-import java.util.List;
-import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
-import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
-
-/*
- * Class that defines the deserializtion of one record brought from the external input data.
- * Every implementation of a deserialization method (Writable, Avro, BP, Thrift, ...)
- * must inherit this abstract class
- * Dummy implementation, for documentation
- */
-public class DummyResolver extends Plugin implements ReadResolver, WriteResolver {
-    private int rowNumber;
-    public DummyResolver(InputData metaData) {
-        super(metaData);
-        rowNumber = 0;
-    }
-    @Override
-    public List<OneField> getFields(OneRow row) throws Exception {
-        /* break up the row into fields */
-        List<OneField> output = new LinkedList<OneField>();
-        String[] fields = ((String) row.getData()).split(",");
-        output.add(new OneField(INTEGER.getOID() /* type */, Integer.parseInt(fields[0]) /* value */));
-        output.add(new OneField(VARCHAR.getOID(), fields[1]));
-        output.add(new OneField(INTEGER.getOID(), Integer.parseInt(fields[2])));
-        return output;
-    }
-    @Override
-    public OneRow setFields(List<OneField> record) throws Exception {
-        /* should read inputStream row by row */
-        return rowNumber > 5
-                ? null
-                : new OneRow(null, "row number " + rowNumber++);
-    }
-}
-```
-
-#### <a id="usageexample"></a>Usage Example
-
-``` sql
-psql=# CREATE EXTERNAL TABLE dummy_tbl
-    (int1 integer, word text, int2 integer)
-LOCATION ('pxf://localhost:51200/dummy_location'
-          '?FRAGMENTER=DummyFragmenter'
-          '&ACCESSOR=DummyAccessor'
-          '&RESOLVER=DummyResolver')
-FORMAT 'custom' (formatter = 'pxfwritable_import');
- 
-CREATE EXTERNAL TABLE
-psql=# SELECT * FROM dummy_tbl;
-int1 | word | int2
-------+------+------
-0 | fragment1 | 0
-1 | fragment1 | 0
-0 | fragment2 | 0
-1 | fragment2 | 0
-0 | fragment3 | 0
-1 | fragment3 | 0
-(6 rows)
-
-psql=# CREATE WRITABLE EXTERNAL TABLE dummy_tbl_write
-    (int1 integer, word text, int2 integer)
-LOCATION ('pxf://localhost:51200/dummy_location'
-          '?ACCESSOR=DummyAccessor'
-          '&RESOLVER=DummyResolver')
-FORMAT 'custom' (formatter = 'pxfwritable_export');
- 
-CREATE EXTERNAL TABLE
-psql=# INSERT INTO dummy_tbl_write VALUES (1, 'a', 11), (2, 'b', 22);
-INSERT 0 2
-```
-
-

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/pxf/ReadWritePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/ReadWritePXF.html.md.erb b/pxf/ReadWritePXF.html.md.erb
deleted file mode 100644
index 18f655d..0000000
--- a/pxf/ReadWritePXF.html.md.erb
+++ /dev/null
@@ -1,123 +0,0 @@
----
-title: Using Profiles to Read and Write Data
----
-
-PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.
-
-For example, if you are writing single line records to text files on HDFS, you could use the built-in HdfsTextSimple profile. You specify this profile when you create the PXF external table used to write the data to HDFS.
-
-## <a id="built-inprofiles"></a>Built-In Profiles
-
-PXF comes with a number of built-in profiles that group together a collection of metadata attributes. PXF built-in profiles simplify access to the following types of data storage systems:
-
--   HDFS File Data (Read + Write)
--   Hive (Read only)
--   HBase (Read only)
--   JSON (Read only)
-
-You can specify a built-in profile when you want to read data that exists inside HDFS files, Hive tables, HBase tables, and JSON files and for writing data into HDFS files.
-
-<table>
-<colgroup>
-<col width="33%" />
-<col width="33%" />
-<col width="33%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Profile</th>
-<th>Description</th>
-<th>Fragmenter/Accessor/Resolver</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>HdfsTextSimple</td>
-<td>Read or write delimited single line records from or to plain text files on HDFS.</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>HdfsTextMulti</td>
-<td>Read delimited single or multi-line records (with quoted linefeeds) from plain text files on HDFS. This profile is not splittable (non parallel); reading is slower than reading with HdfsTextSimple.</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>Hive</td>
-<td>Read a Hive table with any of the available storage formats: text, RC, ORC, Sequence, or Parquet.</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hive.HiveDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveResolver</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>HiveRC</td>
-<td>Optimized read of a Hive table where each partition is stored as an RCFile. 
-<div class="note note">
-Note: The <code class="ph codeph">DELIMITER</code> parameter is mandatory.
-</div></td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveRCFileAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveColumnarSerdeResolver</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>HiveText</td>
-<td>Optimized read of a Hive table where each partition is stored as a text file.
-<div class="note note">
-Note: The <code class="ph codeph">DELIMITER</code> parameter is mandatory.
-</div></td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveLineBreakAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hive.HiveStringPassResolver</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>HBase</td>
-<td>Read an HBase data store engine.</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hbase.HBaseDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hbase.HBaseAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hbase.HBaseResolver</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>Avro</td>
-<td>Read Avro files (fileName.avro).</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.AvroFileAccessor</li>
-<li>org.apache.hawq.pxf.plugins.hdfs.AvroResolver</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>JSON</td>
-<td>Read JSON files (fileName.json) from HDFS.</td>
-<td><ul>
-<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
-<li>org.apache.hawq.pxf.plugins.json.JsonAccessor</li>
-<li>org.apache.hawq.pxf.plugins.json.JsonResolver</li>
-</ul></td>
-</tr>
-</tbody>
-</table>
-
-## <a id="addingandupdatingprofiles"></a>Adding and Updating Profiles
-
-Each profile has a mandatory unique name and an optional description. In addition, each profile contains a set of plug-ins that are an extensible set of metadata attributes.  Administrators can add new profiles or edit the built-in profiles defined in `/etc/pxf/conf/pxf-profiles.xml`. 
-
-**Note:** Add the JAR files associated with custom PXF plug-ins to the `/etc/pxf/conf/pxf-public.classpath` configuration file.
-
-After you make changes in `pxf-profiles.xml` (or any other PXF configuration file), propagate the changes to all nodes with PXF installed, and then restart the PXF service on all nodes.
-
-

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/pxf/TroubleshootingPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/TroubleshootingPXF.html.md.erb b/pxf/TroubleshootingPXF.html.md.erb
deleted file mode 100644
index 9febe09..0000000
--- a/pxf/TroubleshootingPXF.html.md.erb
+++ /dev/null
@@ -1,273 +0,0 @@
----
-title: Troubleshooting PXF
----
-
-## <a id="pxerrortbl"></a>PXF Errors
-
-The following table lists some common errors encountered while using PXF:
-
-<table>
-<caption><span class="tablecap">Table 1. PXF Errors and Explanation</span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Error</th>
-<th>Common Explanation</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>ERROR:  invalid URI pxf://localhost:51200/demo/file1: missing options section</td>
-<td><code class="ph codeph">LOCATION</code> does not include options after the file name: <code class="ph codeph">&lt;path&gt;?&lt;key&gt;=&lt;value&gt;&amp;&lt;key&gt;=&lt;value&gt;...</code></td>
-</tr>
-<tr class="even">
-<td>ERROR:  protocol &quot;pxf&quot; does not exist</td>
-<td>HAWQ is not compiled with PXF protocol. It requires the GPSQL version of HAWQ</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (0) from '&lt;x&gt;': There is no pxf servlet listening on the host and port specified in the external table url.</td>
-<td>Wrong server or port, or the service is not started</td>
-</tr>
-<tr class="even">
-<td>ERROR:  Missing FRAGMENTER option in the pxf uri: pxf://localhost:51200/demo/file1?a=a</td>
-<td>No <code class="ph codeph">FRAGMENTER</code> option was specified in <code class="ph codeph">LOCATION</code>.</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':   type  Exception report   message   org.apache.hadoop.mapred.InvalidInputException:
-<p>Input path does not exist: hdfs://0.0.0.0:8020/demo/file1  </p></td>
-<td>File or pattern given in <code class="ph codeph">LOCATION</code> doesn't exist on specified path.</td>
-</tr>
-<tr class="even">
-<td>ERROR: remote component error (500) from '&lt;x&gt;':   type  Exception report   message   org.apache.hadoop.mapred.InvalidInputException : Input Pattern hdfs://0.0.0.0:8020/demo/file* matches 0 files </td>
-<td>File or pattern given in <code class="ph codeph">LOCATION</code> doesn't exist on specified path.</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;': PXF not correctly installed in CLASSPATH</td>
-<td>Cannot find PXF Jar</td>
-</tr>
-<tr class="even">
-<td>ERROR:  PXF API encountered a HTTP 404 error. Either the PXF service (tomcat) on the DataNode was not started or the PXF webapp was not started.</td>
-<td>Either the required DataNode does not exist or PXF service (tcServer) on the DataNode is not started or PXF webapp was not started</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception report   message   java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface</td>
-<td>One of the classes required for running PXF or one of its plug-ins is missing. Check that all resources in the PXF classpath files exist on the cluster nodes</td>
-</tr>
-<tr class="even">
-<td>ERROR: remote component error (500) from '&lt;x&gt;':   type  Exception report   message   java.io.IOException: Can't get Master Kerberos principal for use as renewer</td>
-<td>Secure PXF: YARN isn't properly configured for secure (Kerberized) HDFS installs</td>
-</tr>
-<tr class="odd">
-<td>ERROR: fail to get filesystem credential for uri hdfs://&lt;namenode&gt;:8020/</td>
-<td>Secure PXF: Wrong HDFS host or port is not 8020 (this is a limitation that will be removed in the next release)</td>
-</tr>
-<tr class="even">
-<td>ERROR: remote component error (413) from '&lt;x&gt;': HTTP status code is 413 but HTTP response string is empty</td>
-<td>The PXF table number of attributes and their name sizes are too large for tcServer to accommodate in its request buffer. The solution is to increase the value of the maxHeaderCount and maxHttpHeaderSize parameters on server.xml on tcServer instance on all nodes and then restart PXF:
-<p>&lt;Connector acceptCount=&quot;100&quot; connectionTimeout=&quot;20000&quot; executor=&quot;tomcatThreadPool&quot; maxKeepAliveRequests=&quot;15&quot;maxHeaderCount=&quot;&lt;some larger value&gt;&quot;maxHttpHeaderSize=&quot;&lt;some larger value in bytes&gt;&quot; port=&quot;${bio.http.port}&quot; protocol=&quot;org.apache.coyote.http11.Http11Protocol&quot; redirectPort=&quot;${bio.https.port}&quot;/&gt;</p></td>
-</tr>
-<tr class="odd">
-<td>ERROR: remote component error (500) from '&lt;x&gt;': type Exception report message java.lang.Exception: Class com.pivotal.pxf.&lt;plugin name&gt; does not appear in classpath. Plugins provided by PXF must start with &quot;org.apache.hawq.pxf&quot;</td>
-<td>Querying a PXF table that still uses the old package name (&quot;com.pivotal.pxf.*&quot;) results in an error message that recommends moving to the new package name (&quot;org.apache.hawq.pxf&quot;). </td>
-</tr>
-<tr class="even">
-<td><strong>HBase Specific Errors</strong></td>
-<td> </td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':   type  Exception report   message    org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for t1,,99999999999999 after 10 tries.</td>
-<td>HBase service is down, probably HRegionServer</td>
-</tr>
-<tr class="even">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception report   message   org.apache.hadoop.hbase.TableNotFoundException: nosuch</td>
-<td>HBase cannot find the requested table</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception report   message   java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface</td>
-<td>PXF cannot find a required JAR file, probably HBase's</td>
-</tr>
-<tr class="even">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':   type  Exception report   message   java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException</td>
-<td>PXF cannot find ZooKeeper's JAR</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception report   message   java.lang.Exception: java.lang.IllegalArgumentException: Illegal HBase column name a, missing :</td>
-<td>PXF table has an illegal field name. Each field name must correspond to an HBase column in the syntax &lt;column family&gt;:&lt;field name&gt;</td>
-</tr>
-<tr class="even">
-<td>ERROR: remote component error (500) from '&lt;x&gt;': type Exception report message org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family a does not exist in region t1,,1405517248353.85f4977bfa88f4d54211cb8ac0f4e644. in table 't1', {NAME =&amp;gt; 'cf', DATA_BLOCK_ENCODING =&amp;gt; 'NONE', BLOOMFILTER =&amp;gt; 'ROW', REPLICATION_SCOPE =&amp;gt; '0', COMPRESSION =&amp;gt; 'NONE', VERSIONS =&amp;gt; '1', TTL =&amp;gt; '2147483647', MIN_VERSIONS =&amp;gt; '0', KEEP_DELETED_CELLS =&amp;gt; 'false', BLOCKSIZE =&amp;gt; '65536', ENCODE_ON_DISK =&amp;gt; 'true', IN_MEMORY =&amp;gt; 'false', BLOCKCACHE =&amp;gt; 'true'}</td>
-<td>Required HBase table does not contain the requested column</td>
-</tr>
-<tr class="odd">
-<td><strong>Hive-Specific Errors</td>
-<td> </td>
-</tr>
-<tr class="even">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception report   message   java.lang.RuntimeException: Failed to connect to Hive metastore: java.net.ConnectException: Connection refused</td>
-<td>Hive Metastore service is down</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from '&lt;x&gt;': type  Exception report   message
-<p>NoSuchObjectException(message:default.players table not found)</p></td>
-<td>Table doesn't exist in Hive</td>
-</tr>
-<tr class="even">
-<td><strong>JSON-Specific Errors</strong></td>
-<td> </td>
-</tr>
-<tr class="odd">
-<td>ERROR: No fields in record (seg0 slice1 host:&ltn&gt pid=&ltn&gt)
-<p>DETAIL: External table &lttablename&gt</p></td>
-<td>Check your JSON file for empty lines; remove them and try again</td>
-</tr>
-<tr class="odd">
-<td>ERROR:  remote component error (500) from host:51200:  type  Exception report   message   &lttext&gt[0] is not an array node    description   The server encountered an internal error that prevented it from fulfilling this request.    exception   java.io.IOException: &lttext&gt[0] is not an array node (libchurl.c:878)  (seg4 host:40000 pid=&ltn&gt)  
-<p>DETAIL:  External table &lttablename&gt</p></td>
-<td>JSON field assumed to be an array, but it is a scalar field.
-</td>
-</tr>
-
-</tbody>
-</table>
-
-
-## <a id="pxflogging"></a>PXF Logging
-Enabling more verbose logging may aid PXF troubleshooting efforts.
-
-PXF provides two categories of message logging - service-level and database-level.
-
-### <a id="pxfsvclogmsg"></a>Service-Level Logging
-
-PXF utilizes `log4j` for service-level logging. PXF-service-related log messages are captured in a log file specified by PXF's `log4j` properties file, `/etc/pxf/conf/pxf-log4j.properties`. The default PXF logging configuration will write `INFO` and more severe level logs to `/var/log/pxf/pxf-service.log`.
-
-PXF provides more detailed logging when the `DEBUG` level is enabled.  To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`:
-
-``` shell
-#log4j.logger.org.apache.hawq.pxf=DEBUG
-```
-
-and restart the PXF service:
-
-``` shell
-$ sudo service pxf-service restart
-```
-
-With `DEBUG` level logging now enabled, perform your PXF operations; for example, creating and querying an external table. (Make note of the time; this will direct you to the relevant log messages in `/var/log/pxf/pxf-service.log`.)
-
-``` shell
-$ psql
-```
-
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE hivetest(id int, newid int)
-    LOCATION ('pxf://namenode:51200/pxf_hive1?PROFILE=Hive')
-    FORMAT 'CUSTOM' (formatter='pxfwritable_import');
-gpadmin=# select * from hivetest;
-<select output>
-```
-
-Examine/collect the log messages from `pxf-service.log`.
-
-**Note**: `DEBUG` logging is verbose and has a performance impact.  Remember to turn off PXF service `DEBUG` logging after you have collected the desired information.
- 
-
-### <a id="pxfdblogmsg"></a>Database-Level Logging
-
-Enable HAWQ and PXF debug message logging during operations on PXF external tables by setting the `client_min_messages` server configuration parameter to `DEBUG2` in your `psql` session.
-
-``` shell
-$ psql
-```
-
-``` sql
-gpadmin=# SET client_min_messages=DEBUG2
-gpadmin=# SELECT * FROM hivetest;
-...
-DEBUG2:  churl http header: cell #19: X-GP-URL-HOST: localhost
-DEBUG2:  churl http header: cell #20: X-GP-URL-PORT: 51200
-DEBUG2:  churl http header: cell #21: X-GP-DATA-DIR: pxf_hive1
-DEBUG2:  churl http header: cell #22: X-GP-profile: Hive
-DEBUG2:  churl http header: cell #23: X-GP-URI: pxf://namenode:51200/pxf_hive1?profile=Hive
-...
-```
-
-Examine/collect the log messages from `stdout`.
-
-**Note**: `DEBUG2` database session logging has a performance impact.  Remember to turn off `DEBUG2` logging after you have collected the desired information.
-
-``` sql
-gpadmin=# SET client_min_messages=NOTICE
-```
-
-
-## <a id="pxf-memcfg"></a>Addressing PXF Memory Issues
-
-The Java heap size can be a limiting factor in PXF’s ability to serve many concurrent requests or to run queries against large tables.
-
-You may run into situations where a query will hang or fail with an Out of Memory exception (OOM). This typically occurs when many threads are reading different data fragments from an external table and insufficient heap space exists to open all fragments at the same time. To avert or remedy this situation, Pivotal recommends first increasing the Java maximum heap size or decreasing the Tomcat maximum number of threads, depending upon what works best for your system configuration.
-
-**Note**: The configuration changes described in this topic require modifying config files on *each* PXF node in your HAWQ cluster. After performing the updates, be sure to verify that the configuration on all PXF nodes is the same.
-
-You will need to re-apply these configuration changes after any PXF version upgrades.
-
-### <a id="pxf-heapcfg"></a>Increasing the Maximum Heap Size
-
-Each PXF node is configured with a default Java heap size of 512MB. If the nodes in your cluster have an ample amount of memory, increasing the amount allocated to the PXF agents is the best approach. Pivotal recommends a heap size value between 1-2GB.
-
-Perform the following steps to increase the PXF agent heap size in your HAWQ  deployment. **You must perform the configuration changes on each PXF node in your HAWQ cluster.**
-
-1. Open `/var/pxf/pxf-service/bin/setenv.sh` in a text editor.
-
-    ``` shell
-    root@pxf-node$ vi /var/pxf/pxf-service/bin/setenv.sh
-    ```
-
-2. Update the `-Xmx` option to the desired value in the `JVM_OPTS` setting:
-
-    ``` shell
-    JVM_OPTS="-Xmx1024M -Xss256K"
-    ```
-
-3. Restart PXF:
-
-    1. If you use Ambari to manage your cluster, restart the PXF service via the Ambari console.
-    2. If you do not use Ambari, restart the PXF service from the command line on each node:
-
-        ``` shell
-        root@pxf-node$ service pxf-service restart
-        ```
-
-### <a id="pxf-heapcfg"></a>Decreasing the Maximum Number of  Threads
-
-If increasing the maximum heap size is not suitable for your HAWQ cluster, try decreasing the number of concurrent working threads configured for the underlying Tomcat web application. A decrease in the number of running threads will prevent any PXF node from exhausting its memory, while ensuring that current queries run to completion (albeit a bit slower). As Tomcat's default behavior is to queue requests until a thread is free, decreasing this value will not result in denied requests.
-
-The Tomcat default maximum number of threads is 300. Pivotal recommends  decreasing the maximum number of threads to under 6. (If you plan to run large workloads on a large number of files using a Hive profile, Pivotal recommends you pick an even lower value.)
-
-Perform the following steps to decrease the maximum number of Tomcat threads in your HAWQ PXF deployment. **You must perform the configuration changes on each PXF node in your HAWQ cluster.**
-
-1. Open the `/var/pxf/pxf-service/conf/server.xml` file in a text editor.
-
-    ``` shell
-    root@pxf-node$ vi /var/pxf/pxf-service/conf/server.xml
-    ```
-
-2. Update the `Catalina` `Executor` block to identify the desired `maxThreads` value:
-
-    ``` xml
-    <Executor maxThreads="2"
-              minSpareThreads="50"
-              name="tomcatThreadPool"
-              namePrefix="tomcat-http--"/>
-    ```
-
-3. Restart PXF:
-
-    1. If you use Ambari to manage your cluster, restart the PXF service via the Ambari console.
-    2. If you do not use Ambari, restart the PXF service from the command line on each node:
-
-        ``` shell
-        root@pxf-node$ service pxf-service restart
-        ```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/query/HAWQQueryProcessing.html.md.erb
----------------------------------------------------------------------
diff --git a/query/HAWQQueryProcessing.html.md.erb b/query/HAWQQueryProcessing.html.md.erb
deleted file mode 100644
index 1d221f4..0000000
--- a/query/HAWQQueryProcessing.html.md.erb
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: About HAWQ Query Processing
----
-
-This topic provides an overview of how HAWQ processes queries. Understanding this process can be useful when writing and tuning queries.
-
-Users issue queries to HAWQ as they would to any database management system. They connect to the database instance on the HAWQ master host using a client application such as `psql` and submit SQL statements.
-
-## <a id="topic2"></a>Understanding Query Planning and Dispatch
-
-After a query is accepted on master, the master parses and analyzes the query. After completing its analysis, the master generates a query tree and provides the query tree to the query optimizer.
-
-The query optimizer generates a query plan. Given the cost information of the query plan, resources are requested from the HAWQ resource manager. After the resources are obtained, the dispatcher starts virtual segments and dispatches the query plan to virtual segments for execution.
-
-This diagram depicts basic query flow in HAWQ.
-
-<img src="../images/basic_query_flow.png" id="topic2__image_ezs_wbh_sv" class="image" width="672" />
-
-## <a id="topic3"></a>Understanding HAWQ Query Plans
-
-A query plan is the set of operations HAWQ will perform to produce the answer to a query. Each *node* or step in the plan represents a database operation such as a table scan, join, aggregation, or sort. Plans are read and executed from bottom to top.
-
-In addition to common database operations such as tables scans, joins, and so on, HAWQ has an additional operation type called *motion*. A motion operation involves moving tuples between the segments during query processing. Note that not every query requires a motion. For example, a targeted query plan does not require data to move across the interconnect.
-
-To achieve maximum parallelism during query execution, HAWQ divides the work of the query plan into *slices*. A slice is a portion of the plan that segments can work on independently. A query plan is sliced wherever a *motion* operation occurs in the plan, with one slice on each side of the motion.
-
-For example, consider the following simple query involving a join between two tables:
-
-``` sql
-SELECT customer, amount
-FROM sales JOIN customer USING (cust_id)
-WHERE dateCol = '04-30-2008';
-```
-
-[Query Slice Plan](#topic3__iy140224) shows the query plan. Each segment receives a copy of the query plan and works on it in parallel.
-
-The query plan for this example has a *redistribute motion* that moves tuples between the segments to complete the join. The redistribute motion is necessary because the customer table is distributed across the segments by `cust_id`, but the sales table is distributed across the segments by `sale_id`. To perform the join, the `sales` tuples must be redistributed by `cust_id`. The plan is sliced on either side of the redistribute motion, creating *slice 1* and *slice 2*.
-
-This query plan has another type of motion operation called a *gather motion*. A gather motion is when the segments send results back up to the master for presentation to the client. Because a query plan is always sliced wherever a motion occurs, this plan also has an implicit slice at the very top of the plan (*slice 3*). Not all query plans involve a gather motion. For example, a `CREATE TABLE x AS SELECT...` statement would not have a gather motion because tuples are sent to the newly created table, not to the master.
-
-<a id="topic3__iy140224"></a>
-<span class="figtitleprefix">Figure: </span>Query Slice Plan
-
-<img src="../images/slice_plan.jpg" class="image" width="462" height="382" />
-
-## <a id="topic4"></a>Understanding Parallel Query Execution
-
-HAWQ creates a number of database processes to handle the work of a query. On the master, the query worker process is called the *query dispatcher* (QD). The QD is responsible for creating and dispatching the query plan. It also accumulates and presents the final results. On virtual segments, a query worker process is called a *query executor* (QE). A QE is responsible for completing its portion of work and communicating its intermediate results to the other worker processes.
-
-There is at least one worker process assigned to each *slice* of the query plan. A worker process works on its assigned portion of the query plan independently. During query execution, each virtual segment will have a number of processes working on the query in parallel.
-
-Related processes that are working on the same slice of the query plan but on different virtual segments are called *gangs*. As a portion of work is completed, tuples flow up the query plan from one gang of processes to the next. This inter-process communication between virtual segments is referred to as the *interconnect* component of HAWQ.
-
-[Query Worker Processes](#topic4__iy141495) shows the query worker processes on the master and two virtual segment instances for the query plan illustrated in [Query Slice Plan](#topic3__iy140224).
-
-<a id="topic4__iy141495"></a>
-<span class="figtitleprefix">Figure: </span>Query Worker Processes
-
-<img src="../images/gangs.jpg" class="image" width="318" height="288" />
-



Mime
View raw message