hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [32/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027
Date Mon, 29 Aug 2016 16:47:07 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/about_statistics.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/about_statistics.html.md.erb b/datamgmt/about_statistics.html.md.erb
new file mode 100644
index 0000000..d4b7665
--- /dev/null
+++ b/datamgmt/about_statistics.html.md.erb
@@ -0,0 +1,187 @@
+---
+title: About Database Statistics
+---
+
+## <a id="overview"></a>Overview
+
+Statistics are metadata that describe the data stored in the database. The query optimizer needs up-to-date statistics to choose the best execution plan for a query. For example, if a query joins two tables and one of them must be broadcast to all segments, the optimizer can choose the smaller of the two tables to minimize network traffic.
+
+The statistics used by the optimizer are calculated and saved in the system catalog by the `ANALYZE` command. There are three ways to initiate an analyze operation:
+
+-   You can run the `ANALYZE` command directly.
+-   You can run the `analyzedb` management utility outside of the database, at the command line.
+-   An automatic analyze operation can be triggered when DML operations are performed on tables that have no statistics or when a DML operation modifies a number of rows greater than a specified threshold.
+
+These methods are described in the following sections.
+
+Calculating statistics consumes time and resources, so HAWQ produces estimates by calculating statistics on samples of large tables. In most cases, the default settings provide the information needed to generate correct execution plans for queries. If the statistics produced are not producing optimal query execution plans, the administrator can tune configuration parameters to produce more accurate stastistics by increasing the sample size or the granularity of statistics saved in the system catalog. Producing more accurate statistics has CPU and storage costs and may not produce better plans, so it is important to view explain plans and test query performance to ensure that the additional statistics-related costs result in better query performance.
+
+## <a id="topic_oq3_qxj_3s"></a>System Statistics
+
+### <a id="tablesize"></a>Table Size
+
+The query planner seeks to minimize the disk I/O and network traffic required to execute a query, using estimates of the number of rows that must be processed and the number of disk pages the query must access. The data from which these estimates are derived are the `pg_class` system table columns `reltuples` and `relpages`, which contain the number of rows and pages at the time a `VACUUM` or `ANALYZE` command was last run. As rows are added or deleted, the numbers become less accurate. However, an accurate count of disk pages is always available from the operating system, so as long as the ratio of `reltuples` to `relpages` does not change significantly, the optimizer can produce an estimate of the number of rows that is sufficiently accurate to choose the correct query execution plan.
+
+In append-optimized tables, the number of tuples is kept up-to-date in the system catalogs, so the `reltuples` statistic is not an estimate. Non-visible tuples in the table are subtracted from the total. The `relpages` value is estimated from the append-optimized block sizes.
+
+When the `reltuples` column differs significantly from the row count returned by `SELECT COUNT(*)`, an analyze should be performed to update the statistics.
+
+### <a id="views"></a>The pg\_statistic System Table and pg\_stats View
+
+The `pg_statistic` system table holds the results of the last `ANALYZE` operation on each database table. There is a row for each column of every table. It has the following columns:
+
+starelid  
+The object ID of the table or index the column belongs to.
+
+staatnum  
+The number of the described column, beginning with 1.
+
+stanullfrac  
+The fraction of the column's entries that are null.
+
+stawidth  
+The average stored width, in bytes, of non-null entries.
+
+stadistinct  
+The number of distinct nonnull data values in the column.
+
+stakind*N*  
+A code number indicating the kind of statistics stored in the *N*th slot of the `pg_statistic` row.
+
+staop*N*  
+An operator used to derive the statistics stored in the *N*th slot.
+
+stanumbers*N*  
+Numerical statistics of the appropriate kind for the *N*th slot, or NULL if the slot kind does not involve numerical values.
+
+stavalues*N*  
+Column data values of the appropriate kind for the *N*th slot, or NULL if the slot kind does not store any data values.
+
+The statistics collected for a column vary for different data types, so the `pg_statistic` table stores statistics that are appropriate for the data type in four *slots*, consisting of four columns per slot. For example, the first slot, which normally contains the most common values for a column, consists of the columns `stakind1`, `staop1`, `stanumbers1`, and `stavalues1`. Also see [pg\_statistic](../reference/catalog/pg_statistic.html#topic1).
+
+The `stakindN` columns each contain a numeric code to describe the type of statistics stored in their slot. The `stakind` code numbers from 1 to 99 are reserved for core PostgreSQL data types. HAWQ uses code numbers 1, 2, and 3. A value of 0 means the slot is unused. The following table describes the kinds of statistics stored for the three codes.
+
+<a id="topic_oq3_qxj_3s__table_upf_1yc_nt"></a>
+
+<table>
+<caption><span class="tablecap">Table 1. Contents of pg_statistic &quot;slots&quot;</span></caption>
+<colgroup>
+<col width="50%" />
+<col width="50%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>stakind Code</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>1</td>
+<td><em>Most CommonValues (MCV) Slot</em>
+<ul>
+<li><code class="ph codeph">staop</code> contains the object ID of the &quot;=&quot; operator, used to decide whether values are the same or not.</li>
+<li><code class="ph codeph">stavalues</code> contains an array of the <em>K</em> most common non-null values appearing in the column.</li>
+<li><code class="ph codeph">stanumbers</code> contains the frequencies (fractions of total row count) of the values in the <code class="ph codeph">stavalues</code> array.</li>
+</ul>
+The values are ordered in decreasing frequency. Since the arrays are variable-size, <em>K</em> can be chosen by the statistics collector. Values must occur more than once to be added to the <code class="ph codeph">stavalues</code> array; a unique column has no MCV slot.</td>
+</tr>
+<tr class="even">
+<td>2</td>
+<td><em>Histogram Slot</em> – describes the distribution of scalar data.
+<ul>
+<li><code class="ph codeph">staop</code> is the object ID of the &quot;&lt;&quot; operator, which describes the sort ordering.</li>
+<li><code class="ph codeph">stavalues</code> contains <em>M</em> (where <em>M</em>&gt;=2) non-null values that divide the non-null column data values into <em>M</em>-1 bins of approximately equal population. The first <code class="ph codeph">stavalues</code> item is the minimum value and the last is the maximum value.</li>
+<li><code class="ph codeph">stanumbers</code> is not used and should be null.</li>
+</ul>
+<p>If a Most Common Values slot is also provided, then the histogram describes the data distribution after removing the values listed in the MCV array. (It is a <em>compressed histogram</em> in the technical parlance). This allows a more accurate representation of the distribution of a column with some very common values. In a column with only a few distinct values, it is possible that the MCV list describes the entire data population; in this case the histogram reduces to empty and should be omitted.</p></td>
+</tr>
+<tr class="odd">
+<td>3</td>
+<td><em>Correlation Slot</em> – describes the correlation between the physical order of table tuples and the ordering of data values of this column.
+<ul>
+<li><code class="ph codeph">staop</code> is the object ID of the &quot;&lt;&quot; operator. As with the histogram, more than one entry could theoretically appear.</li>
+<li><code class="ph codeph">stavalues</code> is not used and should be NULL.</li>
+<li><code class="ph codeph">stanumbers</code> contains a single entry, the correlation coefficient between the sequence of data values and the sequence of their actual tuple positions. The coefficient ranges from +1 to -1.</li>
+</ul></td>
+</tr>
+</tbody>
+</table>
+
+The `pg_stats` view presents the contents of `pg_statistic` in a friendlier format. For more information, see [pg\_stats](../reference/catalog/pg_stats.html#topic1).
+
+Newly created tables and indexes have no statistics.
+
+### <a id="topic_oq3_qxj_3s__section_wsy_1rv_mt"></a>Sampling
+
+When calculating statistics for large tables, HAWQ creates a smaller table by sampling the base table. If the table is partitioned, samples are taken from all partitions.
+
+If a sample table is created, the number of rows in the sample is calculated to provide a maximum acceptable relative error. The amount of acceptable error is specified with the `gp_analyze_relative_error` system configuration parameter, which is set to .25 (25%) by default. This is usually sufficiently accurate to generate correct query plans. If `ANALYZE` is not producing good estimates for a table column, you can increase the sample size by setting the `gp_analyze_relative_error` configuration parameter to a lower value. Beware that setting this parameter to a low value can lead to a very large sample size and dramatically increase analyze time.
+
+### <a id="topic_oq3_qxj_3s__section_u5p_brv_mt"></a>Updating Statistics
+
+Running `ANALYZE` with no arguments updates statistics for all tables in the database. This could take a very long time, so it is better to analyze tables selectively after data has changed. You can also analyze a subset of the columns in a table, for example columns used in joins, `WHERE` clauses, `SORT` clauses, `GROUP BY` clauses, or `HAVING` clauses.
+
+Analyzing a severely bloated table can generate poor statistics if the sample contains empty pages, so it is good practice to vacuum a bloated table before analyzing it.
+
+See the SQL Command Reference for details of running the `ANALYZE` command.
+
+Refer to the Management Utility Reference for details of running the `analyzedb` command.
+
+### <a id="topic_oq3_qxj_3s__section_cv2_crv_mt"></a>Analyzing Partitioned and Append-Optimized Tables
+
+When the `ANALYZE` command is run on a partitioned table, it analyzes each leaf-level subpartition, one at a time. You can run `ANALYZE` on just new or changed partition files to avoid analyzing partitions that have not changed. If a table is partitioned, you can analyze just new or changed partitions.
+
+The `analyzedb` command-line utility skips unchanged partitions automatically. It also runs concurrent sessions so it can analyze several partitions concurrently. It runs five sessions by default, but the number of sessions can be set from 1 to 10 with the `-p` command-line option. Each time `analyzedb` runs, it saves state information for append-optimized tables and partitions in the `db_analyze` directory in the master data directory. The next time it runs, `analyzedb` compares the current state of each table with the saved state and skips analyzing a table or partition if it is unchanged. Heap tables are always analyzed.
+
+If the Pivotal Query Optimizer is enabled, you also need to run `ANALYZE             ROOTPARTITION` to refresh the root partition statistics. The Pivotal Query Optimizer requires statistics at the root level for partitioned tables. The legacy optimizer does not use these statistics. Enable the Pivotal Query Optimizer by setting both the `optimizer` and `optimizer_analyze_root_partition` system configuration parameters to on. The root level statistics are then updated when you run `ANALYZE` or `ANALYZE ROOTPARTITION`. The time to run `ANALYZE ROOTPARTITION` is similar to the time to analyze a single partition since `ANALYZE ROOTPARTITION`. The `analyzedb` utility updates root partition statistics by default .
+
+## <a id="topic_gyb_qrd_2t"></a>Configuring Statistics
+
+There are several options for configuring HAWQ statistics collection.
+
+### <a id="statstarget"></a>Statistics Target
+
+The statistics target is the size of the `most_common_vals`, `most_common_freqs`, and `histogram_bounds` arrays for an individual column. By default, the target is 25. The default target can be changed by setting a server configuration parameter and the target can be set for any column using the `ALTER TABLE` command. Larger values increase the time needed to do `ANALYZE`, but may improve the quality of the legacy query optimizer (planner) estimates.
+
+Set the system default statistics target to a different value by setting the `default_statistics_target` server configuration parameter. The default value is usually sufficient, and you should only raise or lower it if your tests demonstrate that query plans improve with the new target. For example, to raise the default statistics target from 25 to 50 you can use the `hawq config` utility:
+
+``` shell
+$ hawq config -c default_statistics_target -v 50
+```
+
+The statististics target for individual columns can be set with the `ALTER             TABLE` command. For example, some queries can be improved by increasing the target for certain columns, especially columns that have irregular distributions. You can set the target to zero for columns that never contribute to query otpimization. When the target is 0, `ANALYZE` ignores the column. For example, the following `ALTER TABLE` command sets the statistics target for the `notes` column in the `emp` table to zero:
+
+``` sql
+ALTER TABLE emp ALTER COLUMN notes SET STATISTICS 0;
+```
+
+The statistics target can be set in the range 0 to 1000, or set it to -1 to revert to using the system default statistics target.
+
+Setting the statistics target on a parent partition table affects the child partitions. If you set statistics to 0 on some columns on the parent table, the statistics for the same columns are set to 0 for all children partitions. However, if you later add or exchange another child partition, the new child partition will use either the default statistics target or, in the case of an exchange, the previous statistics target. Therefore, if you add or exchange child partitions, you should set the statistics targets on the new child table.
+
+### <a id="topic_gyb_qrd_2t__section_j3p_drv_mt"></a>Automatic Statistics Collection
+
+HAWQ can be set to automatically run `ANALYZE` on a table that either has no statistics or has changed significantly when certain operations are performed on the table. For partitioned tables, automatic statistics collection is only triggered when the operation is run directly on a leaf table, and then only the leaf table is analyzed.
+
+Automatic statistics collection has three modes:
+
+-   `none` disables automatic statistics collection.
+-   `on_no_stats` triggers an analyze operation for a table with no existing statistics when any of the commands `CREATE TABLE AS SELECT`, `INSERT`, or `COPY` are executed on the table.
+-   `on_change` triggers an analyze operation when any of the commands `CREATE TABLE AS SELECT`, `UPDATE`, `DELETE`, `INSERT`, or `COPY` are executed on the table and the number of rows affected exceeds the threshold defined by the `gp_autostats_on_change_threshold` configuration parameter.
+
+The automatic statistics collection mode is set separately for commands that occur within a procedural language function and commands that execute outside of a function:
+
+-   The `gp_autostats_mode` configuration parameter controls automatic statistics collection behavior outside of functions and is set to `on_no_stats` by default.
+
+With the `on_change` mode, `ANALYZE` is triggered only if the number of rows affected exceeds the threshold defined by the `gp_autostats_on_change_threshold` configuration parameter. The default value for this parameter is a very high value, 2147483647, which effectively disables automatic statistics collection; you must set the threshold to a lower number to enable it. The `on_change` mode could trigger large, unexpected analyze operations that could disrupt the system, so it is not recommended to set it globally. It could be useful in a session, for example to automatically analyze a table following a load.
+
+To disable automatic statistics collection outside of functions, set the `gp_autostats_mode` parameter to `none`:
+
+``` shell
+$ hawq configure -c gp_autostats_mode -v none
+```
+
+Set the `log_autostats` system configuration parameter to on if you want to log automatic statistics collection operations.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/dml.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/dml.html.md.erb b/datamgmt/dml.html.md.erb
new file mode 100644
index 0000000..8951db2
--- /dev/null
+++ b/datamgmt/dml.html.md.erb
@@ -0,0 +1,35 @@
+---
+title: Managing Data
+---
+
+This chapter provides information about manipulating data and concurrent access in HAWQ.
+
+-   **[Basic Data Operations](../datamgmt/BasicDataOperations.html)**
+
+    This topic describes basic data operations that you perform in HAWQ.
+
+-   **[About Database Statistics](../datamgmt/about_statistics.html)**
+
+    An overview of statistics gathered by the `ANALYZE` command in HAWQ.
+
+-   **[Concurrency Control](../datamgmt/ConcurrencyControl.html)**
+
+    This topic discusses the mechanisms used in HAWQ to provide concurrency control.
+
+-   **[Working with Transactions](../datamgmt/Transactions.html)**
+
+    This topic describes transaction support in HAWQ.
+
+-   **[Loading and Unloading Data](../datamgmt/load/g-loading-and-unloading-data.html)**
+
+    The topics in this section describe methods for loading and writing data into and out of HAWQ, and how to format data files.
+
+-   **[Working with PXF and External Data](../pxf/HawqExtensionFrameworkPXF.html)**
+
+    HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ to query external system data. 
+
+-   **[HAWQ InputFormat for MapReduce](../datamgmt/HAWQInputFormatforMapReduce.html)**
+
+    MapReduce is a programming model developed by Google for processing and generating large data sets on an array of commodity servers. You can use the HAWQ InputFormat option to enable MapReduce jobs to access HAWQ data stored in HDFS.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/client-loadtools.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/client-loadtools.html.md.erb b/datamgmt/load/client-loadtools.html.md.erb
new file mode 100644
index 0000000..6b8deff
--- /dev/null
+++ b/datamgmt/load/client-loadtools.html.md.erb
@@ -0,0 +1,88 @@
+---
+title: Client-Based HAWQ Load Tools
+---
+HAWQ supports data loading from Red Hat Enterprise Linux 5, 6, and 7 and Windows XP client systems. HAWQ Load Tools include both a loader program and a parallel file distribution program.
+
+This topic presents the instructions to install the HAWQ Load Tools on your client machine. It also includes the information necessary to configure HAWQ databases to accept remote client connections.
+
+## <a id="installloadrunrhel"></a>RHEL Load Tools
+### <a id="installloadrunux"></a>Running the RHEL Installer
+
+1. Download the `greenplum-loaders-4.3.x.x-build-n-RHEL5.zip` installer package from [Pivotal Network](https://network.pivotal.io/products/pivotal-gpdb). Make note of the directory to which the file was downloaded.
+ 
+2. Unzip the run the installer. `sudo` privileges are required if you plan to accept the default install location of `/usr/local/greenplum-loaders-4.3.x.x-build-n`.
+
+    ``` shell
+    $ unzip greenplum-loaders-4.3.x.x-build-n-RHEL5-x86_64.zip
+    $ /bin/bash greenplum-loaders-4.3.x.x-build-n-RHEL5-x86_64.bin
+    ```
+    
+    The installer will prompt you to accept the license agreement and to provide an installation path. Enter an absolute path if you choose not to accept the default install location.
+
+
+## <a id="installloadrunwin"></a>Windows Load Tools
+
+### <a id="installpythonwin"></a>Installing Python 2.5
+The HAWQ Load Tools for Windows requires that the 32-bit version of Python 2.5 be installed on your machine. 
+
+**Note**: The 64-bit version of Python is not compatible with the HAWQ Load Tools for Windows.
+
+1. Download the [Python 2.5 installer for Windows](https://www.python.org/downloads/).  Make note of the directory to which it was downloaded.
+
+2. Double-click on the `python Load Tools for Windows-2.5.x.msi` package to launch the installer.
+3. Select **Install for all users** and click **Next**.
+4. The default Python install location is `C:\Pythonxx`. Click **Up** or **New** to choose another location. Click **Next**.
+5. Click **Next** to install the selected Python components.
+6. Click **Finish** to complete the Python installation.
+
+
+### <a id="installloadrunwin"></a>Running the Windows Installer
+
+1. Download the `greenplum-loaders-4.3.x.x-build-n-WinXP-x86_32.msi` installer package from [Pivotal Network](https://network.pivotal.io/products/pivotal-gpdb). Make note of the directory to which it was downloaded.
+ 
+2. Double-click the `greenplum-loaders-4.3.x.x-build-n-WinXP-x86_32.msi` file to launch the installer.
+3. Click **Next** on the **Welcome** screen.
+4. Click **I Agree** on the **License Agreement** screen.
+5. The default install location for HAWQ Loader Tools for Windows is `C:\"Program Files (x86)"\Greenplum\greenplum-loaders-4.3.8.1-build-1`. Click **Browse** to choose another location.
+6. Click **Next**.
+7. Click **Install** to begin the installation.
+8. Click **Finish** to exit the installer.
+
+    
+## <a id="installloadabout"></a>About the Loader Installation
+Your HAWQ Load Tools installation includes the following files and directories:
+
+`bin` — data loading command-line tools (`gpfdist` and `gpload`)  
+`ext` — external dependent components (python)  
+`lib` - data loading library files  
+`greenplum_loaders_path.[sh | bat]` — environment set up file
+
+
+
+## <a id="installloadcfgenv"></a>Configuring the Load Environment
+
+?? NEED TO VERIFY RHEL/WIN HAVE SAME env varb NAMES ??
+
+A `greenplum_loaders_path.[sh | bat]` file is provided in your load tools base install directory following installation. This file sets the following environment variables:
+
+- `GPHOME_LOADERS` - base directory of loader installation
+- `PATH` - adds the loader and component program directories
+- *`LIBRARY_PATH`* (OS-specific name) - adds the loader and component library directories
+
+Source `greenplum_loaders.sh` or execute `greenplum_loaders.bat` to set up your HAWQ environment before running the HAWQ Load Tools.
+ 
+
+## <a id="installloadenableclientconn"></a>Enabling Remote Client Connections
+The HAWQ master database must be configured to accept remote client connections.  Specifically, you need to identify the client hosts and database users that will be connecting to the HAWQ database.
+
+1. Ensure that the HAWQ database master `pg_hba.conf` file is correctly configured to allow connections from the desired users operating on the desired database from the desired hosts, using the authentication method you choose. For details, see [Configuring Client Access](../../clientaccess/client_auth.html#topic2).
+
+    Make sure the authentication method you choose is supported by the client tool you are using.
+    
+2. If you edited the `pg_hba.conf` file, reload the server configuration:
+
+    ``` shell
+    $ hawq stop -u
+    ```
+
+3. Verify and/or configure the databases and roles you are using to connect, and that the roles have the correct privileges to the database objects.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/creating-external-tables-examples.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/creating-external-tables-examples.html.md.erb b/datamgmt/load/creating-external-tables-examples.html.md.erb
new file mode 100644
index 0000000..8cdbff1
--- /dev/null
+++ b/datamgmt/load/creating-external-tables-examples.html.md.erb
@@ -0,0 +1,117 @@
+---
+title: Creating External Tables - Examples
+---
+
+The following examples show how to define external data with different protocols. Each `CREATE EXTERNAL TABLE` command can contain only one protocol.
+
+**Note:** When using IPv6, always enclose the numeric IP addresses in square brackets.
+
+Start `gpfdist` before you create external tables with the `gpfdist` protocol. The following code starts the `gpfdist` file server program in the background on port *8081* serving files from directory `/var/data/staging`. The logs are saved in `/home/gpadmin/log`.
+
+``` shell
+$ gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &
+```
+
+## <a id="ex1"></a>Example 1 - Single gpfdist instance on single-NIC machine
+
+Creates a readable external table, `ext_expenses`, using the `gpfdist` protocol. The files are formatted with a pipe (|) as the column delimiter.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+        ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-1:8082/*')
+    FORMAT 'TEXT' (DELIMITER '|');
+```
+
+## <a id="ex2"></a>Example 2 - Multiple gpfdist instances
+
+Creates a readable external table, *ext\_expenses*, using the `gpfdist` protocol from all files with the *txt* extension. The column delimiter is a pipe ( | ) and NULL is a space (' ').
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+        ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8081/*.txt')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
+    
+```
+
+## <a id="ex3"></a>Example 3 - Multiple gpfdists instances
+
+Creates a readable external table, *ext\_expenses,* from all files with the *txt* extension using the `gpfdists` protocol. The column delimiter is a pipe ( | ) and NULL is a space (' '). For information about the location of security certificates, see [gpfdists Protocol](g-gpfdists-protocol.html).
+
+1.  Run `gpfdist` with the `--ssl` option.
+2.  Run the following command.
+
+    ``` sql
+    =# CREATE EXTERNAL TABLE ext_expenses
+             ( name text, date date, amount float4, category text, desc1 text )
+        LOCATION ('gpfdists://etlhost-1:8081/*.txt', 'gpfdists://etlhost-2:8082/*.txt')
+        FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
+        
+    ```
+
+## <a id="ex4"></a>Example 4 - Single gpfdist instance with error logging
+
+Uses the gpfdist protocol to create a readable external table, `ext_expenses,` from all files with the *txt* extension. The column delimiter is a pipe ( | ) and NULL (' ') is a space.
+
+Access to the external table is single row error isolation mode. Input data formatting errors can be captured so that you can view the errors, fix the issues, and then reload the rejected data. If the error count on a segment is greater than five (the `SEGMENT REJECT LIMIT` value), the entire external table operation fails and no rows are processed.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+         ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8082/*.txt')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
+    LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
+    
+```
+
+To create the readable `ext_expenses` table from CSV-formatted text files:
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+         ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8082/*.txt')
+    FORMAT 'CSV' ( DELIMITER ',' )
+    LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
+    
+```
+
+## <a id="ex5"></a>Example 5 - Readable Web External Table with Script
+
+Creates a readable web external table that executes a script once on five virtual segments:
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE log_output (linenum int, message text)
+    EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
+    FORMAT 'TEXT' (DELIMITER '|');
+    
+```
+
+## <a id="ex6"></a>Example 6 - Writable External Table with gpfdist
+
+Creates a writable external table, *sales\_out*, that uses `gpfdist` to write output data to the file *sales.out*. The column delimiter is a pipe ( | ) and NULL is a space (' '). The file will be created in the directory specified when you started the gpfdist file server.
+
+``` sql
+=# CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
+    LOCATION ('gpfdist://etl1:8081/sales.out')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
+    DISTRIBUTED BY (txn_id);
+    
+```
+
+## <a id="ex7"></a>Example 7 - Writable External Web Table with Script
+
+Creates a writable external web table, `campaign_out`, that pipes output data recieved by the segments to an executable script, `to_adreport_etl.sh`:
+
+``` sql
+=# CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
+        (LIKE campaign)
+        EXECUTE '/var/unload_scripts/to_adreport_etl.sh' ON 6
+        FORMAT 'TEXT' (DELIMITER '|');
+```
+
+## <a id="ex8"></a>Example 8 - Readable and Writable External Tables with XML Transformations
+
+HAWQ can read and write XML data to and from external tables with gpfdist. For information about setting up an XML transform, see [Transforming XML Data](g-transforming-xml-data.html#topic75).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb b/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
new file mode 100644
index 0000000..28a0bfe
--- /dev/null
+++ b/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
@@ -0,0 +1,22 @@
+---
+title: About gpfdist Setup and Performance
+---
+
+Consider the following scenarios for optimizing your ETL network performance.
+
+-   Allow network traffic to use all ETL host Network Interface Cards (NICs) simultaneously. Run one instance of `gpfdist` on the ETL host, then declare the host name of each NIC in the `LOCATION` clause of your external table definition (see [Creating External Tables - Examples](creating-external-tables-examples.html#topic44)).
+
+<a id="topic14__du165872"></a>
+<span class="figtitleprefix">Figure: </span>External Table Using Single gpfdist Instance with Multiple NICs
+
+<img src="../../images/ext_tables_multinic.jpg" class="image" width="472" height="271" />
+
+-   Divide external table data equally among multiple `gpfdist` instances on the ETL host. For example, on an ETL system with two NICs, run two `gpfdist` instances (one on each NIC) to optimize data load performance and divide the external table data files evenly between the two `gpfdists`.
+
+<a id="topic14__du165882"></a>
+
+<span class="figtitleprefix">Figure: </span>External Tables Using Multiple gpfdist Instances with Multiple NICs
+
+<img src="../../images/ext_tables.jpg" class="image" width="467" height="282" />
+
+**Note:** Use pipes (|) to separate formatted text when you submit files to `gpfdist`. HAWQ encloses comma-separated text strings in single or double quotes. `gpfdist` has to remove the quotes to parse the strings. Using pipes to separate formatted text avoids the extra step and improves performance.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-character-encoding.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-character-encoding.html.md.erb b/datamgmt/load/g-character-encoding.html.md.erb
new file mode 100644
index 0000000..9f3756d
--- /dev/null
+++ b/datamgmt/load/g-character-encoding.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Character Encoding
+---
+
+Character encoding systems consist of a code that pairs each character from a character set with something else, such as a sequence of numbers or octets, to facilitate data stransmission and storage. HAWQ supports a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended UNIX Code), UTF-8, and Mule internal code. Clients can use all supported character sets transparently, but a few are not supported for use within the server as a server-side encoding.
+
+Data files must be in a character encoding recognized by HAWQ. Data files that contain invalid or unsupported encoding sequences encounter errors when loading into HAWQ.
+
+**Note:** On data files generated on a Microsoft Windows operating system, run the `dos2unix` system command to remove any Windows-only characters before loading into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-command-based-web-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-command-based-web-external-tables.html.md.erb b/datamgmt/load/g-command-based-web-external-tables.html.md.erb
new file mode 100644
index 0000000..7830cc3
--- /dev/null
+++ b/datamgmt/load/g-command-based-web-external-tables.html.md.erb
@@ -0,0 +1,26 @@
+---
+title: Command-based Web External Tables
+---
+
+The output of a shell command or script defines command-based web table data. Specify the command in the `EXECUTE` clause of `CREATE EXTERNAL WEB                 TABLE`. The data is current as of the time the command runs. The `EXECUTE` clause runs the shell command or script on the specified master or virtual segments. The virtual segments run the command in parallel. Scripts must be executable by the gpadmin user and reside in the same location on the master or the hosts of virtual segments.
+
+The command that you specify in the external table definition executes from the database and cannot access environment variables from `.bashrc` or `.profile`. Set environment variables in the `EXECUTE` clause. The following external web table, for example, runs a command on the HAWQ master host:
+
+``` sql
+CREATE EXTERNAL WEB TABLE output (output text)
+EXECUTE 'PATH=/home/gpadmin/programs; export PATH; myprogram.sh'
+    ON MASTER 
+FORMAT 'TEXT';
+```
+
+The following command defines a web table that runs a script on five virtual segments.
+
+``` sql
+CREATE EXTERNAL WEB TABLE log_output (linenum int, message text) 
+EXECUTE '/var/load_scripts/get_log_data.sh' ON 5 
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+The virtual segments are selected by the resource manager at runtime.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-configuration-file-format.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-configuration-file-format.html.md.erb b/datamgmt/load/g-configuration-file-format.html.md.erb
new file mode 100644
index 0000000..73f51a9
--- /dev/null
+++ b/datamgmt/load/g-configuration-file-format.html.md.erb
@@ -0,0 +1,66 @@
+---
+title: Configuration File Format
+---
+
+The `gpfdist` configuration file uses the YAML 1.1 document format and implements a schema for defining the transformation parameters. The configuration file must be a valid YAML document.
+
+The `gpfdist` program processes the document in order and uses indentation (spaces) to determine the document hierarchy and relationships of the sections to one another. The use of white space is significant. Do not use white space for formatting and do not use tabs.
+
+The following is the basic structure of a configuration file.
+
+``` pre
+---
+VERSION:   1.0.0.1
+TRANSFORMATIONS: 
+transformation_name1:
+TYPE:      input | output
+COMMAND:   command
+CONTENT:   data | paths
+SAFE:      posix-regex
+STDERR:    server | console
+transformation_name2:
+TYPE:      input | output
+COMMAND:   command 
+...
+```
+
+VERSION  
+Required. The version of the `gpfdist` configuration file schema. The current version is 1.0.0.1.
+
+TRANSFORMATIONS  
+Required. Begins the transformation specification section. A configuration file must have at least one transformation. When `gpfdist` receives a transformation request, it looks in this section for an entry with the matching transformation name.
+
+TYPE  
+Required. Specifies the direction of transformation. Values are `input` or `output`.
+
+-   `input`: `gpfdist` treats the standard output of the transformation process as a stream of records to load into HAWQ.
+-   `output` <span class="ph">: </span> `gpfdist` treats the standard input of the transformation process as a stream of records from HAWQ to transform and write to the appropriate output.
+
+COMMAND  
+Required. Specifies the command `gpfdist` will execute to perform the transformation.
+
+For input transformations, `gpfdist` invokes the command specified in the `CONTENT` setting. The command is expected to open the underlying file(s) as appropriate and produce one line of `TEXT` for each row to load into HAWQ /&gt;. The input transform determines whether the entire content should be converted to one row or to multiple rows.
+
+For output transformations, `gpfdist` invokes this command as specified in the `CONTENT` setting. The output command is expected to open and write to the underlying file(s) as appropriate. The output transformation determines the final placement of the converted output.
+
+CONTENT  
+Optional. The values are `data` and `paths`. The default value is `data`.
+
+-   When `CONTENT` specifies `data`, the text `%filename%` in the `COMMAND` section is replaced by the path to the file to read or write.
+-   When `CONTENT` specifies `paths`, the text `%filename%` in the `COMMAND` section is replaced by the path to the temporary file that contains the list of files to read or write.
+
+The following is an example of a `COMMAND` section showing the text `%filename%` that is replaced.
+
+``` pre
+COMMAND: /bin/bash input_transform.sh %filename%
+```
+
+SAFE  
+Optional. A `POSIX `regular expression that the paths must match to be passed to the transformation. Specify `SAFE` when there is a concern about injection or improper interpretation of paths passed to the command. The default is no restriction on paths.
+
+STDERR  
+Optional.The values are `server` and `console`.
+
+This setting specifies how to handle standard error output from the transformation. The default, `server`, specifies that `gpfdist` will capture the standard error output from the transformation in a temporary file and send the first 8k of that file to HAWQ as an error message. The error message will appear as a SQL error. `Console` specifies that `gpfdist` does not redirect or transmit the standard error output from the transformation.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-controlling-segment-parallelism.html.md.erb b/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
new file mode 100644
index 0000000..4e0096c
--- /dev/null
+++ b/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Controlling Segment Parallelism
+---
+
+The `gp_external_max_segs` server configuration parameter controls the number of virtual segments that can simultaneously access a single `gpfdist` instance. The default is 64. You can set the number of segments such that some segments process external data files and some perform other database processing. Set this parameter in the `hawq-site.xml` file of your master instance.
+
+The number of segments in the `gpfdist` location list specify the minimum number of virtual segments required to serve data to a `gpfdist` external table.
+
+The `hawq_rm_nvseg_perquery_perseg_limit` and `hawq_rm_nvseg_perquery_limit` parameters also control segment parallelism by specifying the maximum number of segments used in running queries on a `gpfdist` external table on the cluster.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb b/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
new file mode 100644
index 0000000..ade14ea
--- /dev/null
+++ b/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Capture Row Formatting Errors and Declare a Reject Limit
+---
+
+The following SQL fragment captures formatting errors internally in HAWQ and declares a reject limit of 10 rows.
+
+``` sql
+LOG ERRORS INTO errortable SEGMENT REJECT LIMIT 10 ROWS
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb b/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
new file mode 100644
index 0000000..4ef6cab
--- /dev/null
+++ b/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: Creating and Using Web External Tables
+---
+
+`CREATE EXTERNAL WEB TABLE` creates a web table definition. Web external tables allow HAWQ to treat dynamic data sources like regular database tables. Because web table data can change as a query runs, the data is not rescannable.
+
+You can define command-based or URL-based web external tables. The definition forms are distinct: you cannot mix command-based and URL-based definitions.
+
+-   **[Command-based Web External Tables](../../datamgmt/load/g-command-based-web-external-tables.html)**
+
+-   **[URL-based Web External Tables](../../datamgmt/load/g-url-based-web-external-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb b/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
new file mode 100644
index 0000000..e0c3c17
--- /dev/null
+++ b/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
@@ -0,0 +1,24 @@
+---
+title: Define an External Table with Single Row Error Isolation
+---
+
+The following example logs errors internally in HAWQ and sets an error threshold of 10 errors.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses ( name text, date date, amount float4, category text, desc1 text )
+   LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-2:8082/*')
+   FORMAT 'TEXT' (DELIMITER '|')
+   LOG ERRORS INTO errortable SEGMENT REJECT LIMIT 10 ROWS;
+```
+
+The following example creates an external table, *ext\_expenses*, sets an error threshold of 10 errors, and writes error rows to the table *err\_expenses*.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+     ( name text, date date, amount float4, category text, desc1 text )
+   LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-2:8082/*')
+   FORMAT 'TEXT' (DELIMITER '|')
+   LOG ERRORS INTO err_expenses SEGMENT REJECT LIMIT 10 ROWS;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb b/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
new file mode 100644
index 0000000..8a24474
--- /dev/null
+++ b/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
@@ -0,0 +1,43 @@
+---
+title: Defining a Command-Based Writable External Web Table
+---
+
+You can define writable external web tables to send output rows to an application or script. The application must accept an input stream, reside in the same location on all of the HAWQ segment hosts, and be executable by the `gpadmin` user. All segments in the HAWQ system run the application or script, whether or not a segment has output rows to process.
+
+Use `CREATE WRITABLE EXTERNAL WEB TABLE` to define the external table and specify the application or script to run on the segment hosts. Commands execute from within the database and cannot access environment variables (such as `$PATH`). Set environment variables in the `EXECUTE` clause of your writable external table definition. For example:
+
+``` sql
+=# CREATE WRITABLE EXTERNAL WEB TABLE output (output text) 
+    EXECUTE 'export PATH=$PATH:/home/gpadmin/programs; myprogram.sh' 
+    ON 6
+    FORMAT 'TEXT'
+    DISTRIBUTED RANDOMLY;
+```
+
+The following HAWQ variables are available for use in OS commands executed by a web or writable external table. Set these variables as environment variables in the shell that executes the command(s). They can be used to identify a set of requests made by an external table statement across the HAWQ array of hosts and segment instances.
+
+<caption><span class="tablecap">Table 1. External Table EXECUTE Variables</span></caption>
+
+<a id="topic71__du224024"></a>
+
+| Variable            | Description                                                                                                                |
+|---------------------|----------------------------------------------------------------------------------------------------------------------------|
+| $GP\_CID            | Command count of the transaction executing the external table statement.                                                   |
+| $GP\_DATABASE       | The database in which the external table definition resides.                                                               |
+| $GP\_DATE           | The date on which the external table command ran.                                                                          |
+| $GP\_MASTER\_HOST   | The host name of the HAWQ master host from which the external table statement was dispatched.                              |
+| $GP\_MASTER\_PORT   | The port number of the HAWQ master instance from which the external table statement was dispatched.                        |
+| $GP\_SEG\_DATADIR   | The location of the data directory of the segment instance executing the external table command.                           |
+| $GP\_SEG\_PG\_CONF  | The location of the `hawq-site.xml` file of the segment instance executing the external table command.                     |
+| $GP\_SEG\_PORT      | The port number of the segment instance executing the external table command.                                              |
+| $GP\_SEGMENT\_COUNT | The total number of segment instances in the HAWQ system.                                                                  |
+| $GP\_SEGMENT\_ID    | The ID number of the segment instance executing the external table command (same as `dbid` in `gp_segment_configuration`). |
+| $GP\_SESSION\_ID    | The database session identifier number associated with the external table statement.                                       |
+| $GP\_SN             | Serial number of the external table scan node in the query plan of the external table statement.                           |
+| $GP\_TIME           | The time the external table command was executed.                                                                          |
+| $GP\_USER           | The database user executing the external table statement.                                                                  |
+| $GP\_XID            | The transaction ID of the external table statement.                                                                        |
+
+-   **[Disabling EXECUTE for Web or Writable External Tables](../../datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb b/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
new file mode 100644
index 0000000..a655b07
--- /dev/null
+++ b/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
@@ -0,0 +1,16 @@
+---
+title: Defining a File-Based Writable External Table
+---
+
+Writable external tables that output data to files use the HAWQ parallel file server program, `gpfdist`, or HAWQ Extensions Framework (PXF).
+
+Use the `CREATE WRITABLE EXTERNAL TABLE` command to define the external table and specify the location and format of the output files.
+
+-   With a writable external table using the `gpfdist` protocol, the HAWQ segments send their data to `gpfdist`, which writes the data to the named file. `gpfdist` must run on a host that the HAWQ segments can access over the network. `gpfdist` points to a file location on the output host and writes data received from the HAWQ segments to the file. To divide the output data among multiple files, list multiple `gpfdist` URIs in your writable external table definition.
+-   A writable external web table sends data to an application as a stream of data. For example, unload data from HAWQ and send it to an application that connects to another database or ETL tool to load the data elsewhere. Writable external web tables use the `EXECUTE` clause to specify a shell command, script, or application to run on the segment hosts and accept an input stream of data. See [Defining a Command-Based Writable External Web Table](g-defining-a-command-based-writable-external-web-table.html#topic71) for more information about using `EXECUTE` commands in a writable external table definition.
+
+You can optionally declare a distribution policy for your writable external tables. By default, writable external tables use a random distribution policy. If the source table you are exporting data from has a hash distribution policy, defining the same distribution key column(s) for the writable external table improves unload performance by eliminating the requirement to move rows over the interconnect. If you unload data from a particular table, you can use the `LIKE` clause to copy the column definitions and distribution policy from the source table.
+
+-   **[Example - HAWQ file server (gpfdist)](../../datamgmt/load/g-example-greenplum-file-server-gpfdist.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-determine-the-transformation-schema.html.md.erb b/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
new file mode 100644
index 0000000..1a4eb9b
--- /dev/null
+++ b/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
@@ -0,0 +1,33 @@
+---
+title: Determine the Transformation Schema
+---
+
+To prepare for the transformation project:
+
+1.  <span class="ph">Determine the goal of the project, such as indexing data, analyzing data, combining data, and so on.</span>
+2.  <span class="ph">Examine the XML file and note the file structure and element names. </span>
+3.  <span class="ph">Choose the elements to import and decide if any other limits are appropriate. </span>
+
+For example, the following XML file, *prices.xml*, is a simple, short file that contains price records. Each price record contains two fields: an item number and a price.
+
+``` xml
+<?xml version="1.0" encoding="ISO-8859-1" ?>
+<prices>
+  <pricerecord>
+    <itemnumber>708421</itemnumber>
+    <price>19.99</price>
+  </pricerecord>
+  <pricerecord>
+    <itemnumber>708466</itemnumber>
+    <price>59.25</price>
+  </pricerecord>
+  <pricerecord>
+    <itemnumber>711121</itemnumber>
+    <price>24.99</price>
+  </pricerecord>
+</prices>
+```
+
+The goal is to import all the data into a HAWQ table with an integer `itemnumber` column and a decimal `price` column.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb b/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
new file mode 100644
index 0000000..f0332b5
--- /dev/null
+++ b/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Disabling EXECUTE for Web or Writable External Tables
+---
+
+There is a security risk associated with allowing external tables to execute OS commands or scripts. To disable the use of `EXECUTE` in web and writable external table definitions, set the `gp_external_enable_exec server` configuration parameter to off in your master `hawq-site.xml` file:
+
+``` pre
+gp_external_enable_exec = off
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb b/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
new file mode 100644
index 0000000..d07b463
--- /dev/null
+++ b/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
@@ -0,0 +1,29 @@
+---
+title: Escaping in CSV Formatted Files
+---
+
+By default, the escape character is a `"` (double quote) for CSV-formatted files. If you want to use a different escape character, use the `ESCAPE` clause of `COPY`, `CREATE EXTERNAL TABLE` or the `hawq load` control file to declare a different escape character. In cases where your selected escape character is present in your data, you can use it to escape itself.
+
+For example, suppose you have a table with three columns and you want to load the following three fields:
+
+-   `Free trip to A,B`
+-   `5.89`
+-   `Special rate "1.79"`
+
+Your designated delimiter character is `,` (comma), and your designated escape character is `"` (double quote). The formatted row in your data file looks like this:
+
+``` pre
+         "Free trip to A,B","5.89","Special rate ""1.79"""
+
+      
+```
+
+The data value with a comma character that is part of the data is enclosed in double quotes. The double quotes that are part of the data are escaped with a double quote even though the field value is enclosed in double quotes.
+
+Embedding the entire field inside a set of double quotes guarantees preservation of leading and trailing whitespace characters:
+
+`"`Free trip to A,B `"`,`"`5.89 `"`,`"`Special rate `""`1.79`""             "`
+
+**Note:** In CSV mode, all characters are significant. A quoted value surrounded by white space, or any characters other than `DELIMITER`, includes those characters. This can cause errors if you import data from a system that pads CSV lines with white space to some fixed width. In this case, preprocess the CSV file to remove the trailing white space before importing the data into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb b/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
new file mode 100644
index 0000000..e24a2b7
--- /dev/null
+++ b/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
@@ -0,0 +1,31 @@
+---
+title: Escaping in Text Formatted Files
+---
+
+By default, the escape character is a \\ (backslash) for text-formatted files. You can declare a different escape character in the `ESCAPE` clause of `COPY`, `CREATE EXTERNAL TABLE`, or the `hawq             load` control file. If your escape character appears in your data, use it to escape itself.
+
+For example, suppose you have a table with three columns and you want to load the following three fields:
+
+-   `backslash = \`
+-   `vertical bar = |`
+-   `exclamation point = !`
+
+Your designated delimiter character is `|` (pipe character), and your designated escape character is `\` (backslash). The formatted row in your data file looks like this:
+
+``` pre
+backslash = \\ | vertical bar = \| | exclamation point = !
+```
+
+Notice how the backslash character that is part of the data is escaped with another backslash character, and the pipe character that is part of the data is escaped with a backslash character.
+
+You can use the escape character to escape octal and hexidecimal sequences. The escaped value is converted to the equivalent character when loaded into HAWQ. For example, to load the ampersand character (`&`), use the escape character to escape its equivalent hexidecimal (`\0x26`) or octal (`\046`) representation.
+
+You can disable escaping in `TEXT`-formatted files using the `ESCAPE` clause of `COPY`, `CREATE EXTERNAL TABLE` or the `hawq load` control file as follows:
+
+``` pre
+ESCAPE 'OFF'
+```
+
+This is useful for input data that contains many backslash characters, such as web log data.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-escaping.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-escaping.html.md.erb b/datamgmt/load/g-escaping.html.md.erb
new file mode 100644
index 0000000..0a1e62a
--- /dev/null
+++ b/datamgmt/load/g-escaping.html.md.erb
@@ -0,0 +1,16 @@
+---
+title: Escaping
+---
+
+There are two reserved characters that have special meaning to HAWQ:
+
+-   The designated delimiter character separates columns or fields in the data file.
+-   The newline character designates a new row in the data file.
+
+If your data contains either of these characters, you must escape the character so that HAWQ treats it as data and not as a field separator or new row. By default, the escape character is a \\ (backslash) for text-formatted files and a double quote (") for csv-formatted files.
+
+-   **[Escaping in Text Formatted Files](../../datamgmt/load/g-escaping-in-text-formatted-files.html)**
+
+-   **[Escaping in CSV Formatted Files](../../datamgmt/load/g-escaping-in-csv-formatted-files.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb b/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..4f61396
--- /dev/null
+++ b/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
@@ -0,0 +1,29 @@
+---
+title: Command-based Web External Tables
+---
+
+The output of a shell command or script defines command-based web table data. Specify the command in the `EXECUTE` clause of `CREATE EXTERNAL WEB                 TABLE`. The data is current as of the time the command runs. The `EXECUTE` clause runs the shell command or script on the specified master, and/or segment host or hosts. The command or script must reside on the hosts corresponding to the host(s) defined in the `EXECUTE` clause.
+
+By default, the command is run on segment hosts when active segments have output rows to process. For example, if each segment host runs four primary segment instances that have output rows to process, the command runs four times per segment host. You can optionally limit the number of segment instances that execute the web table command. All segments included in the web table definition in the `ON` clause run the command in parallel.
+
+The command that you specify in the external table definition executes from the database and cannot access environment variables from `.bashrc` or `.profile`. Set environment variables in the `EXECUTE` clause. For example:
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE output (output text)
+EXECUTE 'PATH=/home/gpadmin/programs; export PATH; myprogram.sh'
+    ON MASTER
+FORMAT 'TEXT';
+```
+
+Scripts must be executable by the `gpadmin` user and reside in the same location on the master or segment hosts.
+
+The following command defines a web table that runs a script. The script runs on five virtual segments selected by the resource manager at runtime.
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE log_output
+(linenum int, message text)
+EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-example-greenplum-file-server-gpfdist.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-example-greenplum-file-server-gpfdist.html.md.erb b/datamgmt/load/g-example-greenplum-file-server-gpfdist.html.md.erb
new file mode 100644
index 0000000..a0bf669
--- /dev/null
+++ b/datamgmt/load/g-example-greenplum-file-server-gpfdist.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: Example - HAWQ file server (gpfdist)
+---
+
+``` sql
+=# CREATE WRITABLE EXTERNAL TABLE unload_expenses
+( LIKE expenses )
+LOCATION ('gpfdist://etlhost-1:8081/expenses1.out',
+'gpfdist://etlhost-2:8081/expenses2.out')
+FORMAT 'TEXT' (DELIMITER ',');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb b/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..6f5b9e3
--- /dev/null
+++ b/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
@@ -0,0 +1,54 @@
+---
+title: Example using IRS MeF XML Files (In demo Directory)
+---
+
+This example demonstrates loading a sample IRS Modernized eFile tax return using a Joost STX transformation. The data is in the form of a complex XML file.
+
+The U.S. Internal Revenue Service (IRS) made a significant commitment to XML and specifies its use in its Modernized e-File (MeF) system. In MeF, each tax return is an XML document with a deep hierarchical structure that closely reflects the particular form of the underlying tax code.
+
+XML, XML Schema and stylesheets play a role in their data representation and business workflow. The actual XML data is extracted from a ZIP file attached to a MIME "transmission file" message. For more information about MeF, see [Modernized e-File (Overview)](http://www.irs.gov/uac/Modernized-e-File-Overview) on the IRS web site.
+
+The sample XML document, *RET990EZ\_2006.xml*, is about 350KB in size with two elements:
+
+-   ReturnHeader
+-   ReturnData
+
+The &lt;ReturnHeader&gt; element contains general details about the tax return such as the taxpayer's name, the tax year of the return, and the preparer. The &lt;ReturnData&gt; element contains multiple sections with specific details about the tax return and associated schedules.
+
+The following is an abridged sample of the XML file.
+
+``` xml
+<?xml version="1.0" encoding="UTF-8"?> 
+<Return returnVersion="2006v2.0"
+   xmlns="http://www.irs.gov/efile" 
+   xmlns:efile="http://www.irs.gov/efile"
+   xsi:schemaLocation="http://www.irs.gov/efile"
+   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
+   <ReturnHeader binaryAttachmentCount="1">
+     <ReturnId>AAAAAAAAAAAAAAAAAAAA</ReturnId>
+     <Timestamp>1999-05-30T12:01:01+05:01</Timestamp>
+     <ReturnType>990EZ</ReturnType>
+     <TaxPeriodBeginDate>2005-01-01</TaxPeriodBeginDate>
+     <TaxPeriodEndDate>2005-12-31</TaxPeriodEndDate>
+     <Filer>
+       <EIN>011248772</EIN>
+       ... more data ...
+     </Filer>
+     <Preparer>
+       <Name>Percy Polar</Name>
+       ... more data ...
+     </Preparer>
+     <TaxYear>2005</TaxYear>
+   </ReturnHeader>
+   ... more data ..
+```
+
+The goal is to import all the data into a HAWQ database. First, convert the XML document into text with newlines "escaped", with two columns: `ReturnId` and a single column on the end for the entire MeF tax return. For example:
+
+``` pre
+AAAAAAAAAAAAAAAAAAAA|<Return returnVersion="2006v2.0"... 
+```
+
+Load the data into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb b/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..0484523
--- /dev/null
+++ b/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
@@ -0,0 +1,54 @@
+---
+title: Example using WITSML™ Files (In demo Directory)
+---
+
+This example demonstrates loading sample data describing an oil rig using a Joost STX transformation. The data is in the form of a complex XML file downloaded from energistics.org.
+
+The Wellsite Information Transfer Standard Markup Language (WITSML™) is an oil industry initiative to provide open, non-proprietary, standard interfaces for technology and software to share information among oil companies, service companies, drilling contractors, application vendors, and regulatory agencies. For more information about WITSML™, see [http://www.witsml.org](http://www.witsml.org).
+
+The oil rig information consists of a top level `<rigs>` element with multiple child elements such as `<documentInfo>,                             <rig>`, and so on. The following excerpt from the file shows the type of information in the `<rig>` tag.
+
+``` xml
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet href="../stylesheets/rig.xsl" type="text/xsl" media="screen"?>
+<rigs 
+ xmlns="http://www.witsml.org/schemas/131" 
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
+ xsi:schemaLocation="http://www.witsml.org/schemas/131 ../obj_rig.xsd" 
+ version="1.3.1.1">
+ <documentInfo>
+ ... misc data ...
+ </documentInfo>
+ <rig uidWell="W-12" uidWellbore="B-01" uid="xr31">
+     <nameWell>6507/7-A-42</nameWell>
+     <nameWellbore>A-42</nameWellbore>
+     <name>Deep Drill #5</name>
+     <owner>Deep Drilling Co.</owner>
+     <typeRig>floater</typeRig>
+     <manufacturer>Fitsui Engineering</manufacturer>
+     <yearEntService>1980</yearEntService>
+     <classRig>ABS Class A1 M CSDU AMS ACCU</classRig>
+     <approvals>DNV</approvals>
+ ... more data ...
+```
+
+The goal is to import the information for this rig into HAWQ.
+
+The sample document, *rig.xml*, is about 11KB in size. The input does not contain tabs so the relevant information can be converted into records delimited with a pipe (|).
+
+`W-12|6507/7-A-42|xr31|Deep Drill #5|Deep Drilling Co.|John                             Doe|John.Doe@example.com|`
+
+With the columns:
+
+-   `well_uid text`, -- e.g. W-12
+-   `well_name text`, -- e.g. 6507/7-A-42
+-   `rig_uid text`, -- e.g. xr31
+-   `rig_name text`, -- e.g. Deep Drill \#5
+-   `rig_owner text`, -- e.g. Deep Drilling Co.
+-   `rig_contact text`, -- e.g. John Doe
+-   `rig_email text`, -- e.g. John.Doe@example.com
+-   `doc xml`
+
+Then, load the data into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb b/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
new file mode 100644
index 0000000..174529a
--- /dev/null
+++ b/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: Examples - Read Fixed-Width Data
+---
+
+The following examples show how to read fixed-width data.
+
+## Example 1 – Loading a table with PRESERVED\_BLANKS on
+
+``` sql
+CREATE READABLE EXTERNAL TABLE students (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/')
+FORMAT 'CUSTOM' (formatter=fixedwidth_in, name=20, address=30, age=4,
+        preserve_blanks='on',null='NULL');
+```
+
+## Example 2 – Loading data with no line delimiter
+
+``` sql
+CREATE READABLE EXTERNAL TABLE students (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/')
+FORMAT 'CUSTOM' (formatter=fixedwidth_in, name='20', address='30', age='4', 
+        line_delim='?@');
+```
+
+## Example 3 – Create a writable external table with a \\r\\n line delimiter
+
+``` sql
+CREATE WRITABLE EXTERNAL TABLE students_out (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/filename')     
+FORMAT 'CUSTOM' (formatter=fixedwidth_out, 
+   name=20, address=30, age=4, line_delim=E'\r\n');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-external-tables.html.md.erb b/datamgmt/load/g-external-tables.html.md.erb
new file mode 100644
index 0000000..4142a07
--- /dev/null
+++ b/datamgmt/load/g-external-tables.html.md.erb
@@ -0,0 +1,44 @@
+---
+title: Accessing File-Based External Tables
+---
+
+External tables enable accessing external files as if they are regular database tables. They are often used to move data into and out of a HAWQ database.
+
+To create an external table definition, you specify the format of your input files and the location of your external data sources. For information input file formats, see [Formatting Data Files](g-formatting-data-files.html#topic95).
+
+Use one of the following protocols to access external table data sources. You cannot mix protocols in `CREATE EXTERNAL TABLE` statements:
+
+-   `gpfdist://` points to a directory on the file host and serves external data files to all HAWQ segments in parallel. See [gpfdist Protocol](g-gpfdist-protocol.html#topic_sny_yph_kr).
+-   `gpfdists://` is the secure version of `gpfdist`. See [gpfdists Protocol](g-gpfdists-protocol.html#topic_sny_yph_kr).
+-   `pxf://` specifies data accessed through the HAWQ Extensions Framework (PXF). PXF is a service that uses plug-in Java classes to read and write data in external data sources. PXF includes plug-ins to access data in HDFS, HBase, and Hive. Custom plug-ins can be written to access other external data sources.
+
+External tables allow you to access external files from within the database as if they are regular database tables. Used with `gpfdist`, the HAWQ parallel file distribution program, or HAWQ Extensions Framework (PXF), external tables provide full parallelism by using the resources of all HAWQ segments to load or unload data.
+
+You can query external table data directly and in parallel using SQL commands such as `SELECT`, `JOIN`, or `SORT EXTERNAL TABLE             DATA`, and you can create views for external tables.
+
+The steps for using external tables are:
+
+1.  Define the external table.
+2.  Start the gpfdist file server(s) if you plan to use the `gpfdist` or `gpdists` protocols.
+3.  Place the data files in the correct locations.
+4.  Query the external table with SQL commands.
+
+HAWQ provides readable and writable external tables:
+
+-   Readable external tables for data loading. Readable external tables support basic extraction, transformation, and loading (ETL) tasks common in data warehousing. HAWQ segment instances read external table data in parallel to optimize large load operations. You cannot modify readable external tables.
+-   Writable external tables for data unloading. Writable external tables support:
+
+    -   Selecting data from database tables to insert into the writable external table.
+    -   Sending data to an application as a stream of data. For example, unload data from HAWQ and send it to an application that connects to another database or ETL tool to load the data elsewhere.
+    -   Receiving output from HAWQ parallel MapReduce calculations.
+
+    Writable external tables allow only `INSERT` operations.
+
+External tables can be file-based or web-based.
+
+-   Regular (file-based) external tables access static flat files. Regular external tables are rescannable: the data is static while the query runs.
+-   Web (web-based) external tables access dynamic data sources, either on a web server with the `http://` protocol or by executing OS commands or scripts. Web external tables are not rescannable: the data can change while the query runs.
+
+Dump and restore operate only on external and web external table *definitions*, not on the data sources.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-formatting-columns.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-formatting-columns.html.md.erb b/datamgmt/load/g-formatting-columns.html.md.erb
new file mode 100644
index 0000000..b828212
--- /dev/null
+++ b/datamgmt/load/g-formatting-columns.html.md.erb
@@ -0,0 +1,19 @@
+---
+title: Formatting Columns
+---
+
+The default column or field delimiter is the horizontal `TAB` character (`0x09`) for text files and the comma character (`0x2C`) for CSV files. You can declare a single character delimiter using the `DELIMITER` clause of `COPY`, `CREATE                 EXTERNAL TABLE` or the `hawq load` configuration table when you define your data format. The delimiter character must appear between any two data value fields. Do not place a delimiter at the beginning or end of a row. For example, if the pipe character ( | ) is your delimiter:
+
+``` pre
+data value 1|data value 2|data value 3
+```
+
+The following command shows the use of the pipe character as a column delimiter:
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_table (name text, date date)
+LOCATION ('gpfdist://host:port/filename.txt)
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-formatting-data-files.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-formatting-data-files.html.md.erb b/datamgmt/load/g-formatting-data-files.html.md.erb
new file mode 100644
index 0000000..6c929ad
--- /dev/null
+++ b/datamgmt/load/g-formatting-data-files.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Formatting Data Files
+---
+
+When you use the HAWQ tools for loading and unloading data, you must specify how your data is formatted. `COPY`, `CREATE             EXTERNAL TABLE, `and `hawq load` have clauses that allow you to specify how your data is formatted. Data can be delimited text (`TEXT`) or comma separated values (`CSV`) format. External data must be formatted correctly to be read by HAWQ. This topic explains the format of data files expected by HAWQ.
+
+-   **[Formatting Rows](../../datamgmt/load/g-formatting-rows.html)**
+
+-   **[Formatting Columns](../../datamgmt/load/g-formatting-columns.html)**
+
+-   **[Representing NULL Values](../../datamgmt/load/g-representing-null-values.html)**
+
+-   **[Escaping](../../datamgmt/load/g-escaping.html)**
+
+-   **[Character Encoding](../../datamgmt/load/g-character-encoding.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-formatting-rows.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-formatting-rows.html.md.erb b/datamgmt/load/g-formatting-rows.html.md.erb
new file mode 100644
index 0000000..ea9b416
--- /dev/null
+++ b/datamgmt/load/g-formatting-rows.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Formatting Rows
+---
+
+HAWQ expects rows of data to be separated by the `LF` character (Line feed, `0x0A`), `CR` (Carriage return, `0x0D`), or `CR` followed by `LF` (`CR+LF`, `0x0D 0x0A`). `LF` is the standard newline representation on UNIX or UNIX-like operating systems. Operating systems such as Windows or Mac OS X use `CR` or `CR+LF`. All of these representations of a newline are supported by HAWQ as a row delimiter. For more information, see [Importing and Exporting Fixed Width Data](g-importing-and-exporting-fixed-width-data.html#topic37).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-gpfdist-protocol.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-gpfdist-protocol.html.md.erb b/datamgmt/load/g-gpfdist-protocol.html.md.erb
new file mode 100644
index 0000000..f41c946
--- /dev/null
+++ b/datamgmt/load/g-gpfdist-protocol.html.md.erb
@@ -0,0 +1,15 @@
+---
+title: gpfdist Protocol
+---
+
+The `gpfdist://` protocol is used in a URI to reference a running `gpfdist` instance. The `gpfdist` utility serves external data files from a directory on a file host to all HAWQ segments in parallel.
+
+`gpfdist` is located in the `$GPHOME/bin` directory on your HAWQ master host and on each segment host.
+
+Run `gpfdist` on the host where the external data files reside. `gpfdist` uncompresses `gzip` (`.gz`) and `bzip2` (.`bz2`) files automatically. You can use the wildcard character (\*) or other C-style pattern matching to denote multiple files to read. The files specified are assumed to be relative to the directory that you specified when you started the `gpfdist` instance.
+
+All virtual segments access the external file(s) in parallel, subject to the number of segments set in the `gp_external_max_segments` parameter, the length of the `gpfdist` location list, and the limits specified by the `hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` parameters. Use multiple `gpfdist` data sources in a `CREATE EXTERNAL TABLE` statement to scale the external table's scan performance. For more information about configuring `gpfdist`, see [Using the Greenplum Parallel File Server (gpfdist)](g-using-the-greenplum-parallel-file-server--gpfdist-.html#topic13).
+
+See the `gpfdist` reference documentation for more information about using `gpfdist` with external tables.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-gpfdists-protocol.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-gpfdists-protocol.html.md.erb b/datamgmt/load/g-gpfdists-protocol.html.md.erb
new file mode 100644
index 0000000..2f5641d
--- /dev/null
+++ b/datamgmt/load/g-gpfdists-protocol.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: gpfdists Protocol
+---
+
+The `gpfdists://` protocol is a secure version of the `gpfdist://         protocol`. To use it, you run the `gpfdist` utility with the `--ssl` option. When specified in a URI, the `gpfdists://` protocol enables encrypted communication and secure identification of the file server and the HAWQ to protect against attacks such as eavesdropping and man-in-the-middle attacks.
+
+`gpfdists` implements SSL security in a client/server scheme with the following attributes and limitations:
+
+-   Client certificates are required.
+-   Multilingual certificates are not supported.
+-   A Certificate Revocation List (CRL) is not supported.
+-   The `TLSv1` protocol is used with the `TLS_RSA_WITH_AES_128_CBC_SHA` encryption algorithm.
+-   SSL parameters cannot be changed.
+-   SSL renegotiation is supported.
+-   The SSL ignore host mismatch parameter is set to `false`.
+-   Private keys containing a passphrase are not supported for the `gpfdist` file server (server.key) and for the HAWQ (client.key).
+-   Issuing certificates that are appropriate for the operating system in use is the user's responsibility. Generally, converting certificates as shown in [https://www.sslshopper.com/ssl-converter.html](https://www.sslshopper.com/ssl-converter.html) is supported.
+
+    **Note:** A server started with the `gpfdist --ssl` option can only communicate with the `gpfdists` protocol. A server that was started with `gpfdist` without the `--ssl` option can only communicate with the `gpfdist` protocol.
+
+-   The client certificate file, client.crt
+-   The client private key file, client.key
+
+Use one of the following methods to invoke the `gpfdists` protocol.
+
+-   Run `gpfdist` with the `--ssl` option and then use the `gpfdists` protocol in the `LOCATION` clause of a `CREATE EXTERNAL TABLE` statement.
+-   Use a `hawq load` YAML control file with the `SSL` option set to true.
+
+Using `gpfdists` requires that the following client certificates reside in the `$PGDATA/gpfdists` directory on each segment.
+
+-   The client certificate file, `client.crt`
+-   The client private key file, `client.key`
+-   The trusted certificate authorities, `root.crt`
+
+For an example of loading data into an external table security, see [Example 3 - Multiple gpfdists instances](creating-external-tables-examples.html#topic47).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb b/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
new file mode 100644
index 0000000..2b8dc78
--- /dev/null
+++ b/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
@@ -0,0 +1,9 @@
+---
+title: Handling Errors in External Table Data
+---
+
+By default, if external table data contains an error, the command fails and no data loads into the target database table. Define the external table with single row error handling to enable loading correctly formatted rows and to isolate data errors in external table data. See [Handling Load Errors](g-handling-load-errors.html#topic55).
+
+The `gpfdist` file server uses the `HTTP` protocol. External table queries that use `LIMIT` end the connection after retrieving the rows, causing an HTTP socket error. If you use `LIMIT` in queries of external tables that use the `gpfdist://` or `http:// protocols`, ignore these errors – data is returned to the database as expected.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-handling-load-errors.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-handling-load-errors.html.md.erb b/datamgmt/load/g-handling-load-errors.html.md.erb
new file mode 100644
index 0000000..6faf7a5
--- /dev/null
+++ b/datamgmt/load/g-handling-load-errors.html.md.erb
@@ -0,0 +1,28 @@
+---
+title: Handling Load Errors
+---
+
+Readable external tables are most commonly used to select data to load into regular database tables. You use the `CREATE TABLE AS SELECT` or `INSERT                 INTO `commands to query the external table data. By default, if the data contains an error, the entire command fails and the data is not loaded into the target database table.
+
+The `SEGMENT REJECT LIMIT` clause allows you to isolate format errors in external table data and to continue loading correctly formatted rows. Use `SEGMENT REJECT LIMIT `to set an error threshold, specifying the reject limit `count` as number of `ROWS` (the default) or as a `PERCENT` of total rows (1-100).
+
+The entire external table operation is aborted, and no rows are processed, if the number of error rows reaches the `SEGMENT REJECT LIMIT`. The limit of error rows is per-segment, not per entire operation. The operation processes all good rows, and it discards and optionally logs formatting errors for erroneous rows, if the number of error rows does not reach the `SEGMENT REJECT                 LIMIT`.
+
+The `LOG ERRORS` clause allows you to keep error rows for further examination. For information about the `LOG ERRORS` clause, see the `CREATE EXTERNAL TABLE` command.
+
+When you set `SEGMENT REJECT LIMIT`, HAWQ scans the external data in single row error isolation mode. Single row error isolation mode applies to external data rows with format errors such as extra or missing attributes, attributes of a wrong data type, or invalid client encoding sequences. HAWQ does not check constraint errors, but you can filter constraint errors by limiting the `SELECT` from an external table at runtime. For example, to eliminate duplicate key errors:
+
+``` sql
+=# INSERT INTO table_with_pkeys 
+SELECT DISTINCT * FROM external_table;
+```
+
+-   **[Define an External Table with Single Row Error Isolation](../../datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html)**
+
+-   **[Capture Row Formatting Errors and Declare a Reject Limit](../../datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html)**
+
+-   **[Identifying Invalid CSV Files in Error Table Data](../../datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html)**
+
+-   **[Moving Data between Tables](../../datamgmt/load/g-moving-data-between-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb b/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
new file mode 100644
index 0000000..534d530
--- /dev/null
+++ b/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Identifying Invalid CSV Files in Error Table Data
+---
+
+If a CSV file contains invalid formatting, the *rawdata* field in the error table can contain several combined rows. For example, if a closing quote for a specific field is missing, all the following newlines are treated as embedded newlines. When this happens, HAWQ stops parsing a row when it reaches 64K, puts that 64K of data into the error table as a single row, resets the quote flag, and continues. If this happens three times during load processing, the load file is considered invalid and the entire load fails with the message "`rejected ` `N` ` or more rows`". See [Escaping in CSV Formatted Files](g-escaping-in-csv-formatted-files.html#topic101) for more information on the correct use of quotes in CSV files.
+
+


Mime
View raw message