hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [37/57] [abbrv] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs
Date Tue, 10 Jan 2017 23:54:28 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-register_files.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-register_files.html.md.erb b/markdown/datamgmt/load/g-register_files.html.md.erb
new file mode 100644
index 0000000..25c24ca
--- /dev/null
+++ b/markdown/datamgmt/load/g-register_files.html.md.erb
@@ -0,0 +1,217 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders into HAWQ internal tables. Files can be read directly, rather than having to be copied or loaded, resulting in higher performance and more efficient transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded into the appropriate HAWQ table directory in HDFS and the utility updates the corresponding HAWQ metadata for the files. Either AO or Parquet-formatted tables in HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the  server are:
+
+-   All hosts in your HAWQ cluster (master and segments) must have network access between them and the hosts containing the data to be loaded.
+-   The Hadoop client must be configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##<a id="topic1__section2"></a>Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, allowing them to be managed as a HAWQ internal table. When registering files, you can optionally specify the maximum amount of data to be loaded, in bytes, using the `--eof` option. If registering a folder, the actual file sizes are used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned tables are not supported. Attempting to register these tables will result in an error. 
+
+Metadata for the Parquet file(s) and the destination table must be consistent. Different data types are used by HAWQ tables and Parquet files, so data must be mapped. You must verify that the structure of the Parquet files and the HAWQ table are compatible before running `hawq register`. Not all HIVE data types can be mapped to HAWQ equivalents. The currently-supported HIVE data types are: boolean, int, smallint, tinyint, bigint, float, double, string, binary, char, and varchar.
+
+As a best practice, create a copy of the Parquet file to be registered before running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the original file available for additional Hive queries or if a data mapping error is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ 
+
+The following HIVE data types cannot be converted to HAWQ equivalents: timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS into the table `parquet_table` in HAWQ, which is in the database named `postgres`. The file path of the HIVE-generated file is `hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is `hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database id is 16387, the table filenode id is 77160, and the last file under the filenode is numbered 7.
+
+Run the `hawq register` command for the file location  `hdfs://localhost:8020/temp/hive.paq`:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table
+```
+
+After running the `hawq register` command, the corresponding new location of the file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command updates the metadata of the table `parquet_table` in HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg table is a fixed schema for row-oriented and Parquet AO tables. For row-oriented tables, the table name prefix is pg\_aoseg. For Parquet tables, the table name prefix is pg\_paqseg. 77160 is the relation id of the table.
+
+You can locate the table by one of two methods, either  by relation ID or by table name. 
+
+To find the relation ID, run the following command on the catalog table pg\_class: 
+
+```
+SELECT oid FROM pg_class WHERE relname=$relname
+```
+To find the table name, run the command: 
+
+```
+SELECT segrelid FROM pg_appendonly WHERE relid = $relid
+```
+then run: 
+
+```
+SELECT relname FROM pg_class WHERE oid = segrelid
+```
+
+## <a id="topic1__section3"></a>Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files  by using metadata loaded from a YAML configuration file by using the `--config <yaml_config\>` option. Both AO and Parquet tables can be registered. Tables need not exist in HAWQ before being registered. In disaster recovery, information in a YAML-format file created by the `hawq extract` command can re-create HAWQ tables by using metadata from a backup checkpoint.
+
+You can also use a YAML confguration file to append HDFS files to an existing HAWQ table or create a table and register it into HAWQ.
+
+For disaster recovery, tables can be re-registered using the HDFS files and a YAML file. The clusters are assumed to have data periodically imported from Cluster A to Cluster B. 
+
+Data is registered according to the following conditions: 
+
+-  Existing tables have files appended to the existing HAWQ table.
+-  If a table does not exist, it is created and registered into HAWQ. The catalog table will be updated with the file size specified by the YAML file.
+-  If the -\\\-force option is used, the data in existing catalog tables is erased and re-registered. All HDFS-related catalog contents in `pg_aoseg.pg_paqseg_$relid ` are cleared. The original files on HDFS are retained.
+
+Tables using random distribution are preferred for registering into HAWQ.
+
+There are additional restrictions when registering hash tables. When registering hash-distributed tables using a YAML file, the distribution policy in the YAML file must match that of the table being registered into and the order of the files in the YAML file should reflect the hash distribution. The size of the registered file should be identical to or a multiple of the hash table bucket number. 
+
+Only single-level partitioned tables can be registered into HAWQ.
+
+
+###Example: Registration using a YAML Configuration File
+
+This example shows how to use `hawq register` to register HDFS data using a YAML configuration file generated by hawq extract. 
+
+First, create a table in SQL and insert some data into it.  
+
+```
+=> CREATE TABLE paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);
+=> INSERT INTO paq1 VALUES(generate_series(1,1000), 'abcde');
+```
+
+Extract the table metadata by using the `hawq extract` utility.
+
+```
+hawq extract -o paq1.yml paq1
+```
+
+Register the data into new table paq2, using the -\\\-config option to identify the YAML file.
+
+```
+hawq register --config paq1.yml paq2
+```
+Select the new table and check to verify that  the content has been registered.
+
+```
+=> SELECT count(*) FROM paq2;
+```
+
+
+## <a id="topic1__section4"></a>Data Type Mapping<a id="topic1__section4"></a>
+
+HIVE and Parquet tables use different data types than HAWQ tables and must be mapped for metadata compatibility. You are responsible for making sure your implementation is mapped to the appropriate data type before running `hawq register`. The tables below show equivalent data types, if available.
+
+<span class="tablecap">Table 1. HAWQ to Parquet Mapping</span>
+
+|HAWQ Data Type   | Parquet Data Type  |
+| :------------| :---------------|
+| bool        | boolean       |
+| int2/int4/date        | int32       |
+| int8/money       | int64      |
+| time/timestamptz/timestamp       | int64      |
+| float4        | float       |
+|float8        | double       |
+|bit/varbit/bytea/numeric       | Byte array       |
+|char/bpchar/varchar/name| Byte array |
+| text/xml/interval/timetz  | Byte array  |
+| macaddr/inet/cidr  | Byte array  |
+
+**Additional HAWQ-to-Parquet Mapping**
+
+**point**:  
+
+``` 
+group {
+    required int x;
+    required int y;
+}
+```
+
+**circle:** 
+
+```
+group {
+    required int x;
+    required int y;
+    required int r;
+}
+```
+
+**box:**  
+
+```
+group {
+    required int x1;
+    required int y1;
+    required int x2;
+    required int y2;
+}
+```
+
+**iseg:** 
+
+
+```
+group {
+    required int x1;
+    required int y1;
+    required int x2;
+    required int y2;
+}
+``` 
+
+**path**:
+  
+```
+group {
+    repeated group {
+        required int x;
+        required int y;
+    }
+}
+```
+
+
+<span class="tablecap">Table 2. HIVE to HAWQ Mapping</span>
+
+|HIVE Data Type   | HAWQ Data Type  |
+| :------------| :---------------|
+| boolean        | bool       |
+| tinyint        | int2       |
+| smallint       | int2/smallint      |
+| int            | int4 / int |
+| bigint         | int8 / bigint      |
+| float        | float4       |
+| double	| float8 |
+| string        | varchar       |
+| binary      | bytea       |
+| char | char |
+| varchar  | varchar  |
+
+
+### Extracting Metadata
+
+For more information on extracting metadata to a YAML file and the output content of the YAML file, refer to the reference page for [hawq extract](../../reference/cli/admin_utilities/hawqextract.html#topic1).
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-representing-null-values.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-representing-null-values.html.md.erb b/markdown/datamgmt/load/g-representing-null-values.html.md.erb
new file mode 100644
index 0000000..4d4ffdd
--- /dev/null
+++ b/markdown/datamgmt/load/g-representing-null-values.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Representing NULL Values
+---
+
+`NULL` represents an unknown piece of data in a column or field. Within your data files you can designate a string to represent null values. The default string is `\N` (backslash-N) in `TEXT` mode, or an empty value with no quotations in `CSV` mode. You can also declare a different string using the `NULL` clause of `COPY`, `CREATE EXTERNAL                 TABLE `or the `hawq load` control file when defining your data format. For example, you can use an empty string if you do not want to distinguish nulls from empty strings. When using the HAWQ loading tools, any data item that matches the designated null string is considered a null value.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html.md.erb b/markdown/datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html.md.erb
new file mode 100644
index 0000000..ba0603c
--- /dev/null
+++ b/markdown/datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Running COPY in Single Row Error Isolation Mode
+---
+
+By default, `COPY` stops an operation at the first error: if the data contains an error, the operation fails and no data loads. If you run `COPY                 FROM` in *single row error isolation mode*, HAWQ skips rows that contain format errors and loads properly formatted rows. Single row error isolation mode applies only to rows in the input file that contain format errors. If the data contains a contraint error such as violation of a `NOT NULL` or `CHECK` constraint, the operation fails and no data loads.
+
+Specifying `SEGMENT REJECT LIMIT` runs the `COPY` operation in single row error isolation mode. Specify the acceptable number of error rows on each segment, after which the entire `COPY FROM` operation fails and no rows load. The error row count is for each HAWQ segment, not for the entire load operation.
+
+If the `COPY` operation does not reach the error limit, HAWQ loads all correctly-formatted rows and discards the error rows. The `LOG ERRORS INTO` clause allows you to keep error rows for further examination. Use `LOG ERRORS` to capture data formatting errors internally in HAWQ. For example:
+
+``` sql
+=> COPY country FROM '/data/gpdb/country_data'
+   WITH DELIMITER '|' LOG ERRORS INTO errtable
+   SEGMENT REJECT LIMIT 10 ROWS;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-starting-and-stopping-gpfdist.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-starting-and-stopping-gpfdist.html.md.erb b/markdown/datamgmt/load/g-starting-and-stopping-gpfdist.html.md.erb
new file mode 100644
index 0000000..7e2cca9
--- /dev/null
+++ b/markdown/datamgmt/load/g-starting-and-stopping-gpfdist.html.md.erb
@@ -0,0 +1,42 @@
+---
+title: Starting and Stopping gpfdist
+---
+
+You can start `gpfdist` in your current directory location or in any directory that you specify. The default port is `8080`.
+
+From your current directory, type:
+
+``` shell
+$ gpfdist &
+```
+
+From a different directory, specify the directory from which to serve files, and optionally, the HTTP port to run on.
+
+To start `gpfdist` in the background and log output messages and errors to a log file:
+
+``` shell
+$ gpfdist -d /var/load_files -p 8081 -l /home/gpadmin/log &
+```
+
+For multiple `gpfdist` instances on the same ETL host (see [External Tables Using Multiple gpfdist Instances with Multiple NICs](g-about-gpfdist-setup-and-performance.html#topic14__du165882)), use a different base directory and port for each instance. For example:
+
+``` shell
+$ gpfdist -d /var/load_files1 -p 8081 -l /home/gpadmin/log1 &
+$ gpfdist -d /var/load_files2 -p 8082 -l /home/gpadmin/log2 &
+```
+
+To stop `gpfdist` when it is running in the background:
+
+First find its process id:
+
+``` shell
+$ ps -ef | grep gpfdist
+```
+
+Then kill the process, for example (where 3456 is the process ID in this example):
+
+``` shell
+$ kill 3456
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-transfer-and-store-the-data.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-transfer-and-store-the-data.html.md.erb b/markdown/datamgmt/load/g-transfer-and-store-the-data.html.md.erb
new file mode 100644
index 0000000..8a6d7ab
--- /dev/null
+++ b/markdown/datamgmt/load/g-transfer-and-store-the-data.html.md.erb
@@ -0,0 +1,16 @@
+---
+title: Transfer and Store the Data
+---
+
+Use one of the following approaches to transform the data with `gpfdist`.
+
+-   `GPLOAD` supports only input transformations, but is easier to implement in many cases.
+-   `INSERT INTO SELECT FROM` supports both input and output transformations, but exposes more details.
+
+-   **[Transforming with GPLOAD](../../datamgmt/load/g-transforming-with-gpload.html)**
+
+-   **[Transforming with INSERT INTO SELECT FROM](../../datamgmt/load/g-transforming-with-insert-into-select-from.html)**
+
+-   **[Configuration File Format](../../datamgmt/load/g-configuration-file-format.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-transforming-with-gpload.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-transforming-with-gpload.html.md.erb b/markdown/datamgmt/load/g-transforming-with-gpload.html.md.erb
new file mode 100644
index 0000000..438fedb
--- /dev/null
+++ b/markdown/datamgmt/load/g-transforming-with-gpload.html.md.erb
@@ -0,0 +1,30 @@
+---
+title: Transforming with GPLOAD
+---
+
+To transform data using the `GPLOAD ` control file, you must specify both the file name for the `TRANSFORM_CONFIG` file and the name of the `TRANSFORM` operation in the `INPUT` section of the `GPLOAD` control file.
+
+-   `TRANSFORM_CONFIG `specifies the name of the `gpfdist` configuration file.
+-   The `TRANSFORM` setting indicates the name of the transformation that is described in the file named in `TRANSFORM_CONFIG`.
+
+``` pre
+---
+VERSION: 1.0.0.1
+DATABASE: ops
+USER: gpadmin
+GPLOAD:
+INPUT:
+- TRANSFORM_CONFIG: config.yaml
+- TRANSFORM: prices_input
+- SOURCE:
+FILE: prices.xml
+```
+
+The transformation operation name must appear in two places: in the `TRANSFORM` setting of the `gpfdist` configuration file and in the `TRANSFORMATIONS` section of the file named in the `TRANSFORM_CONFIG` section.
+
+In the `GPLOAD` control file, the optional parameter `MAX_LINE_LENGTH` specifies the maximum length of a line in the XML transformation data that is passed to hawq load.
+
+The following diagram shows the relationships between the `GPLOAD` control file, the `gpfdist` configuration file, and the XML data file.
+
+<img src="../../images/03-gpload-files.jpg" class="image" width="415" height="258" />
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-transforming-with-insert-into-select-from.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-transforming-with-insert-into-select-from.html.md.erb b/markdown/datamgmt/load/g-transforming-with-insert-into-select-from.html.md.erb
new file mode 100644
index 0000000..d91cc93
--- /dev/null
+++ b/markdown/datamgmt/load/g-transforming-with-insert-into-select-from.html.md.erb
@@ -0,0 +1,22 @@
+---
+title: Transforming with INSERT INTO SELECT FROM
+---
+
+Specify the transformation in the `CREATE EXTERNAL TABLE` definition's `LOCATION` clause. For example, the transform is shown in bold in the following command. (Run `gpfdist` first, using the command `gpfdist             -c config.yaml`).
+
+``` sql
+CREATE READABLE EXTERNAL TABLE prices_readable (LIKE prices)
+   LOCATION ('gpfdist://hostname:8081/prices.xml#transform=prices_input')
+   FORMAT 'TEXT' (DELIMITER '|')
+   LOG ERRORS INTO error_log SEGMENT REJECT LIMIT 10;
+```
+
+In the command above, change *hostname* to your hostname. `prices_input` comes from the configuration file.
+
+The following query loads data into the `prices` table.
+
+``` sql
+INSERT INTO prices SELECT * FROM prices_readable;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-transforming-xml-data.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-transforming-xml-data.html.md.erb b/markdown/datamgmt/load/g-transforming-xml-data.html.md.erb
new file mode 100644
index 0000000..f9520bb
--- /dev/null
+++ b/markdown/datamgmt/load/g-transforming-xml-data.html.md.erb
@@ -0,0 +1,34 @@
+---
+title: Transforming XML Data
+---
+
+The HAWQ data loader *gpfdist* provides transformation features to load XML data into a table and to write data from the HAWQ to XML files. The following diagram shows *gpfdist* performing an XML transform.
+
+<a id="topic75__du185408"></a>
+<span class="figtitleprefix">Figure: </span>External Tables using XML Transformations
+
+<img src="../../images/ext-tables-xml.png" class="image" />
+
+To load or extract XML data:
+
+-   [Determine the Transformation Schema](g-determine-the-transformation-schema.html#topic76)
+-   [Write a Transform](g-write-a-transform.html#topic77)
+-   [Write the gpfdist Configuration](g-write-the-gpfdist-configuration.html#topic78)
+-   [Load the Data](g-load-the-data.html#topic79)
+-   [Transfer and Store the Data](g-transfer-and-store-the-data.html#topic80)
+
+The first three steps comprise most of the development effort. The last two steps are straightforward and repeatable, suitable for production.
+
+-   **[Determine the Transformation Schema](../../datamgmt/load/g-determine-the-transformation-schema.html)**
+
+-   **[Write a Transform](../../datamgmt/load/g-write-a-transform.html)**
+
+-   **[Write the gpfdist Configuration](../../datamgmt/load/g-write-the-gpfdist-configuration.html)**
+
+-   **[Load the Data](../../datamgmt/load/g-load-the-data.html)**
+
+-   **[Transfer and Store the Data](../../datamgmt/load/g-transfer-and-store-the-data.html)**
+
+-   **[XML Transformation Examples](../../datamgmt/load/g-xml-transformation-examples.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-troubleshooting-gpfdist.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-troubleshooting-gpfdist.html.md.erb b/markdown/datamgmt/load/g-troubleshooting-gpfdist.html.md.erb
new file mode 100644
index 0000000..2e6a450
--- /dev/null
+++ b/markdown/datamgmt/load/g-troubleshooting-gpfdist.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: Troubleshooting gpfdist
+---
+
+The segments access `gpfdist` at runtime. Ensure that the HAWQ segment hosts have network access to `gpfdist`. `gpfdist` is a web server: test connectivity by running the following command from each host in the HAWQ array (segments and master):
+
+``` shell
+$ wget http://gpfdist_hostname:port/filename      
+```
+
+The `CREATE EXTERNAL TABLE` definition must have the correct host name, port, and file names for `gpfdist`. Specify file names and paths relative to the directory from which `gpfdist` serves files (the directory path specified when `gpfdist` started). See [Creating External Tables - Examples](creating-external-tables-examples.html#topic44).
+
+If you start `gpfdist` on your system and IPv6 networking is disabled, `gpfdist` displays this warning message when testing for an IPv6 port.
+
+``` pre
+[WRN gpfdist.c:2050] Creating the socket failed
+```
+
+If the corresponding IPv4 port is available, `gpfdist` uses that port and the warning for IPv6 port can be ignored. To see information about the ports that `gpfdist` tests, use the `-V` option.
+
+For information about IPv6 and IPv4 networking, see your operating system documentation.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-unloading-data-from-hawq-database.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-unloading-data-from-hawq-database.html.md.erb b/markdown/datamgmt/load/g-unloading-data-from-hawq-database.html.md.erb
new file mode 100644
index 0000000..e0690ad
--- /dev/null
+++ b/markdown/datamgmt/load/g-unloading-data-from-hawq-database.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Unloading Data from HAWQ
+---
+
+A writable external table allows you to select rows from other database tables and output the rows to files, named pipes, to applications, or as output targets for parallel MapReduce calculations. You can define file-based and web-based writable external tables.
+
+This topic describes how to unload data from HAWQ using parallel unload (writable external tables) and non-parallel unload (`COPY`).
+
+-   **[Defining a File-Based Writable External Table](../../datamgmt/load/g-defining-a-file-based-writable-external-table.html)**
+
+-   **[Defining a Command-Based Writable External Web Table](../../datamgmt/load/g-defining-a-command-based-writable-external-web-table.html)**
+
+-   **[Unloading Data Using a Writable External Table](../../datamgmt/load/g-unloading-data-using-a-writable-external-table.html)**
+
+-   **[Unloading Data Using COPY](../../datamgmt/load/g-unloading-data-using-copy.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-unloading-data-using-a-writable-external-table.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-unloading-data-using-a-writable-external-table.html.md.erb b/markdown/datamgmt/load/g-unloading-data-using-a-writable-external-table.html.md.erb
new file mode 100644
index 0000000..377f2d6
--- /dev/null
+++ b/markdown/datamgmt/load/g-unloading-data-using-a-writable-external-table.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Unloading Data Using a Writable External Table
+---
+
+Writable external tables allow only `INSERT` operations. You must grant `INSERT` permission on a table to enable access to users who are not the table owner or a superuser. For example:
+
+``` sql
+GRANT INSERT ON writable_ext_table TO admin;
+```
+
+To unload data using a writable external table, select the data from the source table(s) and insert it into the writable external table. The resulting rows are output to the writable external table. For example:
+
+``` sql
+INSERT INTO writable_ext_table SELECT * FROM regular_table;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-unloading-data-using-copy.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-unloading-data-using-copy.html.md.erb b/markdown/datamgmt/load/g-unloading-data-using-copy.html.md.erb
new file mode 100644
index 0000000..816a2b5
--- /dev/null
+++ b/markdown/datamgmt/load/g-unloading-data-using-copy.html.md.erb
@@ -0,0 +1,12 @@
+---
+title: Unloading Data Using COPY
+---
+
+`COPY TO` copies data from a table to a file (or standard input) on the HAWQ master host using a single process on the HAWQ master instance. Use `COPY` to output a table's entire contents, or filter the output using a `SELECT` statement. For example:
+
+``` sql
+COPY (SELECT * FROM country WHERE country_name LIKE 'A%') 
+TO '/home/gpadmin/a_list_countries.out';
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-url-based-web-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-url-based-web-external-tables.html.md.erb b/markdown/datamgmt/load/g-url-based-web-external-tables.html.md.erb
new file mode 100644
index 0000000..a115972
--- /dev/null
+++ b/markdown/datamgmt/load/g-url-based-web-external-tables.html.md.erb
@@ -0,0 +1,24 @@
+---
+title: URL-based Web External Tables
+---
+
+A URL-based web table accesses data from a web server using the HTTP protocol. Web table data is dynamic; the data is not rescannable.
+
+Specify the `LOCATION` of files on a web server using `http://`. The web data file(s) must reside on a web server that HAWQ segment hosts can access. The number of URLs specified corresponds to the minimum number of virtual segments that work in parallel to access the web table.
+
+The following sample command defines a web table that gets data from several URLs.
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE ext_expenses (
+    name text, date date, amount float4, category text, description text) 
+LOCATION ('http://intranet.company.com/expenses/sales/file.csv',
+          'http://intranet.company.com/expenses/exec/file.csv',
+          'http://intranet.company.com/expenses/finance/file.csv',
+          'http://intranet.company.com/expenses/ops/file.csv',
+          'http://intranet.company.com/expenses/marketing/file.csv',
+          'http://intranet.company.com/expenses/eng/file.csv' 
+      )
+FORMAT 'CSV' ( HEADER );
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-using-a-custom-format.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-using-a-custom-format.html.md.erb b/markdown/datamgmt/load/g-using-a-custom-format.html.md.erb
new file mode 100644
index 0000000..e83744a
--- /dev/null
+++ b/markdown/datamgmt/load/g-using-a-custom-format.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: Using a Custom Format
+---
+
+You specify a custom data format in the `FORMAT` clause of `CREATE             EXTERNAL TABLE`.
+
+```
+FORMAT 'CUSTOM' (formatter=format_function, key1=val1,...keyn=valn)
+```
+
+Where the `'CUSTOM'` keyword indicates that the data has a custom format and `formatter` specifies the function to use to format the data, followed by comma-separated parameters to the formatter function.
+
+HAWQ provides functions for formatting fixed-width data, but you must author the formatter functions for variable-width data. The steps are as follows.
+
+1.  Author and compile input and output functions as a shared library.
+2.  Specify the shared library function with `CREATE FUNCTION` in HAWQ.
+3.  Use the `formatter` parameter of `CREATE EXTERNAL                TABLE`'s `FORMAT` clause to call the function.
+
+-   **[Importing and Exporting Fixed Width Data](../../datamgmt/load/g-importing-and-exporting-fixed-width-data.html)**
+
+-   **[Examples - Read Fixed-Width Data](../../datamgmt/load/g-examples-read-fixed-width-data.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-using-the-hawq-file-server--gpfdist-.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-using-the-hawq-file-server--gpfdist-.html.md.erb b/markdown/datamgmt/load/g-using-the-hawq-file-server--gpfdist-.html.md.erb
new file mode 100644
index 0000000..0c68b2c
--- /dev/null
+++ b/markdown/datamgmt/load/g-using-the-hawq-file-server--gpfdist-.html.md.erb
@@ -0,0 +1,19 @@
+---
+title: Using the HAWQ File Server (gpfdist)
+---
+
+The `gpfdist` protocol provides the best performance and is the easiest to set up. `gpfdist` ensures optimum use of all segments in your HAWQ system for external table reads.
+
+This topic describes the setup and management tasks for using `gpfdist` with external tables.
+
+-   **[About gpfdist Setup and Performance](../../datamgmt/load/g-about-gpfdist-setup-and-performance.html)**
+
+-   **[Controlling Segment Parallelism](../../datamgmt/load/g-controlling-segment-parallelism.html)**
+
+-   **[Installing gpfdist](../../datamgmt/load/g-installing-gpfdist.html)**
+
+-   **[Starting and Stopping gpfdist](../../datamgmt/load/g-starting-and-stopping-gpfdist.html)**
+
+-   **[Troubleshooting gpfdist](../../datamgmt/load/g-troubleshooting-gpfdist.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-working-with-file-based-ext-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-working-with-file-based-ext-tables.html.md.erb b/markdown/datamgmt/load/g-working-with-file-based-ext-tables.html.md.erb
new file mode 100644
index 0000000..e024a7d
--- /dev/null
+++ b/markdown/datamgmt/load/g-working-with-file-based-ext-tables.html.md.erb
@@ -0,0 +1,21 @@
+---
+title: Working with File-Based External Tables
+---
+
+External tables provide access to data stored in data sources outside of HAWQ as if the data were stored in regular database tables. Data can be read from or written to external tables.
+
+An external table is a HAWQ database table backed with data that resides outside of the database. An external table is either readable or writable. It can be used like a regular database table in SQL commands such as `SELECT` and `INSERT` and joined with other tables. External tables are most often used to load and unload database data.
+
+Web-based external tables provide access to data served by an HTTP server or an operating system process. See [Creating and Using Web External Tables](g-creating-and-using-web-external-tables.html#topic31) for more about web-based tables.
+
+-   **[Accessing File-Based External Tables](../../datamgmt/load/g-external-tables.html)**
+
+    External tables enable accessing external files as if they are regular database tables. They are often used to move data into and out of a HAWQ database.
+
+-   **[gpfdist Protocol](../../datamgmt/load/g-gpfdist-protocol.html)**
+
+-   **[gpfdists Protocol](../../datamgmt/load/g-gpfdists-protocol.html)**
+
+-   **[Handling Errors in External Table Data](../../datamgmt/load/g-handling-errors-ext-table-data.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-write-a-transform.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-write-a-transform.html.md.erb b/markdown/datamgmt/load/g-write-a-transform.html.md.erb
new file mode 100644
index 0000000..6b35ab2
--- /dev/null
+++ b/markdown/datamgmt/load/g-write-a-transform.html.md.erb
@@ -0,0 +1,48 @@
+---
+title: Write a Transform
+---
+
+The transform specifies what to extract from the data.You can use any authoring environment and language appropriate for your project. For XML transformations, choose a technology such as XSLT, Joost (STX), Java, Python, or Perl, based on the goals and scope of the project.
+
+In the price example, the next step is to transform the XML data into a simple two-column delimited format.
+
+``` pre
+708421|19.99
+708466|59.25
+711121|24.99
+```
+
+The following STX transform, called *input\_transform.stx*, completes the data transformation.
+
+``` xml
+<?xml version="1.0"?>
+<stx:transform version="1.0"
+   xmlns:stx="http://stx.sourceforge.net/2002/ns"
+   pass-through="none">
+  <!-- declare variables -->
+  <stx:variable name="itemnumber"/>
+  <stx:variable name="price"/>
+  <!-- match and output prices as columns delimited by | -->
+  <stx:template match="/prices/pricerecord">
+    <stx:process-children/>
+    <stx:value-of select="$itemnumber"/>    
+<stx:text>|</stx:text>
+    <stx:value-of select="$price"/>      <stx:text>
+</stx:text>
+  </stx:template>
+  <stx:template match="itemnumber">
+    <stx:assign name="itemnumber" select="."/>
+  </stx:template>
+  <stx:template match="price">
+    <stx:assign name="price" select="."/>
+  </stx:template>
+</stx:transform>
+```
+
+This STX transform declares two temporary variables, `itemnumber` and `price`, and the following rules.
+
+1.  When an element that satisfies the XPath expression `/prices/pricerecord` is found, examine the child elements and generate output that contains the value of the `itemnumber` variable, a `|` character, the value of the price variable, and a newline.
+2.  When an `<itemnumber>` element is found, store the content of that element in the variable `itemnumber`.
+3.  When a &lt;price&gt; element is found, store the content of that element in the variable `price`.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-write-the-gpfdist-configuration.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-write-the-gpfdist-configuration.html.md.erb b/markdown/datamgmt/load/g-write-the-gpfdist-configuration.html.md.erb
new file mode 100644
index 0000000..89733cd
--- /dev/null
+++ b/markdown/datamgmt/load/g-write-the-gpfdist-configuration.html.md.erb
@@ -0,0 +1,61 @@
+---
+title: Write the gpfdist Configuration
+---
+
+The `gpfdist` configuration is specified as a YAML 1.1 document. It specifies rules that `gpfdist` uses to select a Transform to apply when loading or extracting data.
+
+This example `gpfdist` configuration contains the following items:
+
+-   the `config.yaml` file defining `TRANSFORMATIONS`
+-   the `input_transform.sh` wrapper script, referenced in the `config.yaml` file
+-   the `input_transform.stx` joost transformation, called from `input_transform.sh`
+
+Aside from the ordinary YAML rules, such as starting the document with three dashes (`---`), a `gpfdist` configuration must conform to the following restrictions:
+
+1.  a `VERSION` setting must be present with the value `1.0.0.1`.
+2.  a `TRANSFORMATIONS` setting must be present and contain one or more mappings.
+3.  Each mapping in the `TRANSFORMATION` must contain:
+    -   a `TYPE` with the value 'input' or 'output'
+    -   a `COMMAND` indicating how the transform is run.
+
+4.  Each mapping in the `TRANSFORMATION` can contain optional `CONTENT`, `SAFE`, and `STDERR` settings.
+
+The following `gpfdist` configuration called `config.YAML` applies to the prices example. The initial indentation on each line is significant and reflects the hierarchical nature of the specification. The name `prices_input` in the following example will be referenced later when creating the table in SQL.
+
+``` pre
+---
+VERSION: 1.0.0.1
+TRANSFORMATIONS:
+  prices_input:
+    TYPE:     input
+    COMMAND:  /bin/bash input_transform.sh %filename%
+```
+
+The `COMMAND` setting uses a wrapper script called `input_transform.sh` with a `%filename%` placeholder. When `gpfdist` runs the `prices_input` transform, it invokes `input_transform.sh` with `/bin/bash` and replaces the `%filename%` placeholder with the path to the input file to transform. The wrapper script called `input_transform.sh` contains the logic to invoke the STX transformation and return the output.
+
+If Joost is used, the Joost STX engine must be installed.
+
+``` bash
+#!/bin/bash
+# input_transform.sh - sample input transformation, 
+# demonstrating use of Java and Joost STX to convert XML into
+# text to load into HAWQ.
+# java arguments:
+#   -jar joost.jar         joost STX engine
+#   -nodecl                  don't generate a <?xml?> declaration
+#   $1                        filename to process
+#   input_transform.stx    the STX transformation
+#
+# the AWK step eliminates a blank line joost emits at the end
+java \
+    -jar joost.jar \
+    -nodecl \
+    $1 \
+    input_transform.stx \
+ | awk 'NF>0
+```
+
+The `input_transform.sh` file uses the Joost STX engine with the AWK interpreter. The following diagram shows the process flow as `gpfdist` runs the transformation.
+
+<img src="../../images/02-pipeline.png" class="image" width="462" height="190" />
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-xml-transformation-examples.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-xml-transformation-examples.html.md.erb b/markdown/datamgmt/load/g-xml-transformation-examples.html.md.erb
new file mode 100644
index 0000000..12ad1d6
--- /dev/null
+++ b/markdown/datamgmt/load/g-xml-transformation-examples.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: XML Transformation Examples
+---
+
+The following examples demonstrate the complete process for different types of XML data and STX transformations. Files and detailed instructions associated with these examples can be downloaded from the Apache site `gpfdist_transform` tools demo page. Read the README file before you run the examples.
+
+-   **[Command-based Web External Tables](../../datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html)**
+
+-   **[Example using IRS MeF XML Files (In demo Directory)](../../datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html)**
+
+-   **[Example using WITSML™ Files (In demo Directory)](../../datamgmt/load/g-example-witsml-files-in-demo-directory.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/ddl/ddl-database.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/ddl/ddl-database.html.md.erb b/markdown/ddl/ddl-database.html.md.erb
new file mode 100644
index 0000000..2ef9f9f
--- /dev/null
+++ b/markdown/ddl/ddl-database.html.md.erb
@@ -0,0 +1,78 @@
+---
+title: Creating and Managing Databases
+---
+
+A HAWQ system is a single instance of HAWQ. There can be several separate HAWQ systems installed, but usually just one is selected by environment variable settings. See your HAWQ administrator for details.
+
+There can be multiple databases in a HAWQ system. This is different from some database management systems \(such as Oracle\) where the database instance *is* the database. Although you can create many databases in a HAWQ system, client programs can connect to and access only one database at a time — you cannot cross-query between databases.
+
+## <a id="topic3"></a>About Template Databases 
+
+Each new database you create is based on a *template*. HAWQ provides a default database, *template1*. Use *template1* to connect to HAWQ for the first time. HAWQ uses *template1* to create databases unless you specify another template. Do not create any objects in *template1* unless you want those objects to be in every database you create.
+
+HAWQ uses two other database templates, *template0* and *postgres*, internally. Do not drop or modify *template0* or *postgres*. You can use *template0* to create a completely clean database containing only the standard objects predefined by HAWQ at initialization, especially if you modified *template1*.
+
+## <a id="topic4"></a>Creating a Database 
+
+The `CREATE DATABASE` command creates a new database. For example:
+
+``` sql
+=> CREATE DATABASE new_dbname;
+```
+
+To create a database, you must have privileges to create a database or be a HAWQ superuser. If you do not have the correct privileges, you cannot create a database. The HAWQ administrator must either give you the necessary privileges or to create a database for you.
+
+You can also use the client program `createdb` to create a database. For example, running the following command in a command line terminal connects to HAWQ using the provided host name and port and creates a database named *mydatabase*:
+
+``` shell
+$ createdb -h masterhost -p 5432 mydatabase
+```
+
+The host name and port must match the host name and port of the installed HAWQ system.
+
+Some objects, such as roles, are shared by all the databases in a HAWQ system. Other objects, such as tables that you create, are known only in the database in which you create them.
+
+### <a id="topic5"></a>Cloning a Database 
+
+By default, a new database is created by cloning the standard system database template, *template1*. Any database can be used as a template when creating a new database, thereby providing the capability to 'clone' or copy an existing database and all objects and data within that database. For example:
+
+``` sql
+=> CREATE DATABASE new_dbname TEMPLATE old_dbname
+```
+
+## <a id="topic6"></a>Viewing the List of Databases 
+
+If you are working in the `psql` client program, you can use the `\l` meta-command to show the list of databases and templates in your HAWQ system. If using another client program and you are a superuser, you can query the list of databases from the `pg_database` system catalog table. For example:
+
+``` sql
+=> SELECT datname FROM pg_database;
+```
+
+## <a id="topic7"></a>Altering a Database 
+
+The ALTER DATABASE command changes database attributes such as owner, name, or default configuration attributes. For example, the following command alters a database by setting its default schema search path \(the `search_path` configuration parameter\):
+
+``` sql
+=> ALTER DATABASE mydatabase SET search_path TO myschema, public, pg_catalog;
+```
+
+To alter a database, you must be the owner of the database or a superuser.
+
+## <a id="topic8"></a>Dropping a Database 
+
+The `DROP DATABASE` command drops \(or deletes\) a database. It removes the system catalog entries for the database and deletes the database directory on disk that contains the data. You must be the database owner or a superuser to drop a database, and you cannot drop a database while you or anyone else is connected to it. Connect to `template1` \(or another database\) before dropping a database. For example:
+
+``` shell
+=> \c template1
+```
+``` sql
+=> DROP DATABASE mydatabase;
+```
+
+You can also use the client program `dropdb` to drop a database. For example, the following command connects to HAWQ using the provided host name and port and drops the database *mydatabase*:
+
+``` shell
+$ dropdb -h masterhost -p 5432 mydatabase
+```
+
+**Warning:** Dropping a database cannot be undone.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/ddl/ddl-partition.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/ddl/ddl-partition.html.md.erb b/markdown/ddl/ddl-partition.html.md.erb
new file mode 100644
index 0000000..f790161
--- /dev/null
+++ b/markdown/ddl/ddl-partition.html.md.erb
@@ -0,0 +1,483 @@
+---
+title: Partitioning Large Tables
+---
+
+Table partitioning enables supporting very large tables, such as fact tables, by logically dividing them into smaller, more manageable pieces. Partitioned tables can improve query performance by allowing the HAWQ query optimizer to scan only the data needed to satisfy a given query instead of scanning all the contents of a large table.
+
+Partitioning does not change the physical distribution of table data across the segments. Table distribution is physical: HAWQ physically divides partitioned tables and non-partitioned tables across segments to enable parallel query processing. Table *partitioning* is logical: HAWQ logically divides big tables to improve query performance and facilitate data warehouse maintenance tasks, such as rolling old data out of the data warehouse.
+
+HAWQ supports:
+
+-   *range partitioning*: division of data based on a numerical range, such as date or price.
+-   *list partitioning*: division of data based on a list of values, such as sales territory or product line.
+-   A combination of both types.
+<a id="im207241"></a>
+
+![](../mdimages/partitions.jpg "Example Multi-level Partition Design")
+
+## <a id="topic64"></a>Table Partitioning in HAWQ 
+
+HAWQ divides tables into parts \(also known as partitions\) to enable massively parallel processing. Tables are partitioned during `CREATE TABLE` using the `PARTITION BY` \(and optionally the `SUBPARTITION BY`\) clause. Partitioning creates a top-level \(or parent\) table with one or more levels of sub-tables \(or child tables\). Internally, HAWQ creates an inheritance relationship between the top-level table and its underlying partitions, similar to the functionality of the `INHERITS` clause of PostgreSQL.
+
+HAWQ uses the partition criteria defined during table creation to create each partition with a distinct `CHECK` constraint, which limits the data that table can contain. The query optimizer uses `CHECK` constraints to determine which table partitions to scan to satisfy a given query predicate.
+
+The HAWQ system catalog stores partition hierarchy information so that rows inserted into the top-level parent table propagate correctly to the child table partitions. To change the partition design or table structure, alter the parent table using `ALTER TABLE` with the `PARTITION` clause.
+
+To insert data into a partitioned table, you specify the root partitioned table, the table created with the `CREATE TABLE` command. You also can specify a leaf child table of the partitioned table in an `INSERT` command. An error is returned if the data is not valid for the specified leaf child table. Specifying a child table that is not a leaf child table in the `INSERT` command is not supported.
+
+## <a id="topic65"></a>Deciding on a Table Partitioning Strategy 
+
+Not all tables are good candidates for partitioning. If the answer is *yes* to all or most of the following questions, table partitioning is a viable database design strategy for improving query performance. If the answer is *no* to most of the following questions, table partitioning is not the right solution for that table. Test your design strategy to ensure that query performance improves as expected.
+
+-   **Is the table large enough?** Large fact tables are good candidates for table partitioning. If you have millions or billions of records in a table, you may see performance benefits from logically breaking that data up into smaller chunks. For smaller tables with only a few thousand rows or less, the administrative overhead of maintaining the partitions will outweigh any performance benefits you might see.
+-   **Are you experiencing unsatisfactory performance?** As with any performance tuning initiative, a table should be partitioned only if queries against that table are producing slower response times than desired.
+-   **Do your query predicates have identifiable access patterns?** Examine the `WHERE` clauses of your query workload and look for table columns that are consistently used to access data. For example, if most of your queries tend to look up records by date, then a monthly or weekly date-partitioning design might be beneficial. Or if you tend to access records by region, consider a list-partitioning design to divide the table by region.
+-   **Does your data warehouse maintain a window of historical data?** Another consideration for partition design is your organization's business requirements for maintaining historical data. For example, your data warehouse may require that you keep data for the past twelve months. If the data is partitioned by month, you can easily drop the oldest monthly partition from the warehouse and load current data into the most recent monthly partition.
+-   **Can the data be divided into somewhat equal parts based on some defining criteria?** Choose partitioning criteria that will divide your data as evenly as possible. If the partitions contain a relatively equal number of records, query performance improves based on the number of partitions created. For example, by dividing a large table into 10 partitions, a query will execute 10 times faster than it would against the unpartitioned table, provided that the partitions are designed to support the query's criteria.
+
+Do not create more partitions than are needed. Creating too many partitions can slow down management and maintenance jobs, such as vacuuming, recovering segments, expanding the cluster, checking disk usage, and others.
+
+Partitioning does not improve query performance unless the query optimizer can eliminate partitions based on the query predicates. Queries that scan every partition run slower than if the table were not partitioned, so avoid partitioning if few of your queries achieve partition elimination. Check the explain plan for queries to make sure that partitions are eliminated. See [Query Profiling](../query/query-profiling.html) for more about partition elimination.
+
+Be very careful with multi-level partitioning because the number of partition files can grow very quickly. For example, if a table is partitioned by both day and city, and there are 1,000 days of data and 1,000 cities, the total number of partitions is one million. Column-oriented tables store each column in a physical table, so if this table has 100 columns, the system would be required to manage 100 million files for the table.
+
+Before settling on a multi-level partitioning strategy, consider a single level partition with bitmap indexes. Indexes slow down data loads, so consider performance testing with your data and schema to decide on the best strategy.
+
+## <a id="topic66"></a>Creating Partitioned Tables 
+
+You partition tables when you create them with `CREATE TABLE`. This topic provides examples of SQL syntax for creating a table with various partition designs.
+
+To partition a table:
+
+1.  Decide on the partition design: date range, numeric range, or list of values.
+2.  Choose the column\(s\) on which to partition the table.
+3.  Decide how many levels of partitions you want. For example, you can create a date range partition table by month and then subpartition the monthly partitions by sales region.
+
+-   [Defining Date Range Table Partitions](#topic67)
+-   [Defining Numeric Range Table Partitions](#topic68)
+-   [Defining List Table Partitions](#topic69)
+-   [Defining Multi-level Partitions](#topic70)
+-   [Partitioning an Existing Table](#topic71)
+
+### <a id="topic67"></a>Defining Date Range Table Partitions 
+
+A date range partitioned table uses a single `date` or `timestamp` column as the partition key column. You can use the same partition key column to create subpartitions if necessary, for example, to partition by month and then subpartition by day. Consider partitioning by the most granular level. For example, for a table partitioned by date, you can partition by day and have 365 daily partitions, rather than partition by year then subpartition by month then subpartition by day. A multi-level design can reduce query planning time, but a flat partition design runs faster.
+
+You can have HAWQ automatically generate partitions by giving a `START` value, an `END` value, and an `EVERY` clause that defines the partition increment value. By default, `START` values are always inclusive and `END` values are always exclusive. For example:
+
+``` sql
+CREATE TABLE sales (id int, date date, amt decimal(10,2))
+DISTRIBUTED BY (id)
+PARTITION BY RANGE (date)
+( START (date '2008-01-01') INCLUSIVE
+   END (date '2009-01-01') EXCLUSIVE
+   EVERY (INTERVAL '1 day') );
+```
+
+You can also declare and name each partition individually. For example:
+
+``` sql
+CREATE TABLE sales (id int, date date, amt decimal(10,2))
+DISTRIBUTED BY (id)
+PARTITION BY RANGE (date)
+( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE ,
+  PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,
+  PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,
+  PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,
+  PARTITION May08 START (date '2008-05-01') INCLUSIVE ,
+  PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,
+  PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,
+  PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,
+  PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,
+  PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,
+  PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,
+  PARTITION Dec08 START (date '2008-12-01') INCLUSIVE
+                  END (date '2009-01-01') EXCLUSIVE );
+```
+
+You do not have to declare an `END` value for each partition, only the last one. In this example, `Jan08` ends where `Feb08` starts.
+
+### <a id="topic68"></a>Defining Numeric Range Table Partitions 
+
+A numeric range partitioned table uses a single numeric data type column as the partition key column. For example:
+
+``` sql
+CREATE TABLE rank (id int, rank int, year int, gender
+char(1), count int)
+DISTRIBUTED BY (id)
+PARTITION BY RANGE (year)
+( START (2001) END (2008) EVERY (1),
+  DEFAULT PARTITION extra );
+```
+
+For more information about default partitions, see [Adding a Default Partition](#topic80).
+
+### <a id="topic69"></a>Defining List Table Partitions 
+
+A list partitioned table can use any data type column that allows equality comparisons as its partition key column. A list partition can also have a multi-column \(composite\) partition key, whereas a range partition only allows a single column as the partition key. For list partitions, you must declare a partition specification for every partition \(list value\) you want to create. For example:
+
+``` sql
+CREATE TABLE rank (id int, rank int, year int, gender
+char(1), count int )
+DISTRIBUTED BY (id)
+PARTITION BY LIST (gender)
+( PARTITION girls VALUES ('F'),
+  PARTITION boys VALUES ('M'),
+  DEFAULT PARTITION other );
+```
+
+**Note:** The HAWQ legacy optimizer allows list partitions with multi-column \(composite\) partition keys. A range partition only allows a single column as the partition key. GPORCA does not support composite keys.
+
+For more information about default partitions, see [Adding a Default Partition](#topic80).
+
+### <a id="topic70"></a>Defining Multi-level Partitions 
+
+You can create a multi-level partition design with subpartitions of partitions. Using a *subpartition template* ensures that every partition has the same subpartition design, including partitions that you add later. For example, the following SQL creates the two-level partition design shown in [Figure 1](#im207241):
+
+``` sql
+CREATE TABLE sales (trans_id int, date date, amount
+decimal(9,2), region text)
+DISTRIBUTED BY (trans_id)
+PARTITION BY RANGE (date)
+SUBPARTITION BY LIST (region)
+SUBPARTITION TEMPLATE
+( SUBPARTITION usa VALUES ('usa'),
+  SUBPARTITION asia VALUES ('asia'),
+  SUBPARTITION europe VALUES ('europe'),
+  DEFAULT SUBPARTITION other_regions)
+  (START (date '2011-01-01') INCLUSIVE
+   END (date '2012-01-01') EXCLUSIVE
+   EVERY (INTERVAL '1 month'),
+   DEFAULT PARTITION outlying_dates );
+```
+
+The following example shows a three-level partition design where the `sales` table is partitioned by `year`, then `month`, then `region`. The `SUBPARTITION TEMPLATE` clauses ensure that each yearly partition has the same subpartition structure. The example declares a `DEFAULT` partition at each level of the hierarchy.
+
+``` sql
+CREATE TABLE p3_sales (id int, year int, month int, day int,
+region text)
+DISTRIBUTED BY (id)
+PARTITION BY RANGE (year)
+    SUBPARTITION BY RANGE (month)
+      SUBPARTITION TEMPLATE (
+        START (1) END (13) EVERY (1),
+        DEFAULT SUBPARTITION other_months )
+           SUBPARTITION BY LIST (region)
+             SUBPARTITION TEMPLATE (
+               SUBPARTITION usa VALUES ('usa'),
+               SUBPARTITION europe VALUES ('europe'),
+               SUBPARTITION asia VALUES ('asia'),
+               DEFAULT SUBPARTITION other_regions )
+( START (2002) END (2012) EVERY (1),
+  DEFAULT PARTITION outlying_years );
+```
+
+**CAUTION**:
+
+When you create multi-level partitions on ranges, it is easy to create a large number of subpartitions, some containing little or no data. This can add many entries to the system tables, which increases the time and memory required to optimize and execute queries. Increase the range interval or choose a different partitioning strategy to reduce the number of subpartitions created.
+
+### <a id="topic71"></a>Partitioning an Existing Table 
+
+Tables can be partitioned only at creation. If you have a table that you want to partition, you must create a partitioned table, load the data from the original table into the new table, drop the original table, and rename the partitioned table with the original table's name. You must also re-grant any table permissions. For example:
+
+``` sql
+CREATE TABLE sales2 (LIKE sales)
+PARTITION BY RANGE (date)
+( START (date '2008-01-01') INCLUSIVE
+   END (date '2009-01-01') EXCLUSIVE
+   EVERY (INTERVAL '1 month') );
+INSERT INTO sales2 SELECT * FROM sales;
+DROP TABLE sales;
+ALTER TABLE sales2 RENAME TO sales;
+GRANT ALL PRIVILEGES ON sales TO admin;
+GRANT SELECT ON sales TO guest;
+```
+
+## <a id="topic73"></a>Loading Partitioned Tables 
+
+After you create the partitioned table structure, top-level parent tables are empty. Data is routed to the bottom-level child table partitions. In a multi-level partition design, only the subpartitions at the bottom of the hierarchy can contain data.
+
+Rows that cannot be mapped to a child table partition are rejected and the load fails. To avoid unmapped rows being rejected at load time, define your partition hierarchy with a `DEFAULT` partition. Any rows that do not match a partition's `CHECK` constraints load into the `DEFAULT` partition. See [Adding a Default Partition](#topic80).
+
+At runtime, the query optimizer scans the entire table inheritance hierarchy and uses the `CHECK` table constraints to determine which of the child table partitions to scan to satisfy the query's conditions. The `DEFAULT` partition \(if your hierarchy has one\) is always scanned. `DEFAULT` partitions that contain data slow down the overall scan time.
+
+When you use `COPY` or `INSERT` to load data into a parent table, the data is automatically rerouted to the correct partition, just like a regular table.
+
+Best practice for loading data into partitioned tables is to create an intermediate staging table, load it, and then exchange it into your partition design. See [Exchanging a Partition](#topic83).
+
+## <a id="topic74"></a>Verifying Your Partition Strategy 
+
+When a table is partitioned based on the query predicate, you can use `EXPLAIN` to verify that the query optimizer scans only the relevant data to examine the query plan.
+
+For example, suppose a *sales* table is date-range partitioned by month and subpartitioned by region as shown in [Figure 1](#im207241). For the following query:
+
+``` sql
+EXPLAIN SELECT * FROM sales WHERE date='01-07-12' AND
+region='usa';
+```
+
+The query plan for this query should show a table scan of only the following tables:
+
+-   the default partition returning 0-1 rows \(if your partition design has one\)
+-   the January 2012 partition \(*sales\_1\_prt\_1*\) returning 0-1 rows
+-   the USA region subpartition \(*sales\_1\_2\_prt\_usa*\) returning *some number* of rows.
+
+The following example shows the relevant portion of the query plan.
+
+``` pre
+->  `Seq Scan on``sales_1_prt_1` sales (cost=0.00..0.00 `rows=0`
+     width=0)
+Filter: "date"=01-07-08::date AND region='USA'::text
+->  `Seq Scan on``sales_1_2_prt_usa` sales (cost=0.00..9.87
+`rows=20`
+      width=40)
+```
+
+Ensure that the query optimizer does not scan unnecessary partitions or subpartitions \(for example, scans of months or regions not specified in the query predicate\), and that scans of the top-level tables return 0-1 rows.
+
+### <a id="topic75"></a>Troubleshooting Selective Partition Scanning 
+
+The following limitations can result in a query plan that shows a non-selective scan of your partition hierarchy.
+
+-   The query optimizer can selectively scan partitioned tables only when the query contains a direct and simple restriction of the table using immutable operators such as:
+
+    =, < , <= , \>,  \>= , and <\>
+
+-   Selective scanning recognizes `STABLE` and `IMMUTABLE` functions, but does not recognize `VOLATILE` functions within a query. For example, `WHERE` clauses such as `date > CURRENT_DATE` cause the query optimizer to selectively scan partitioned tables, but `time > TIMEOFDAY` does not.
+
+## <a id="topic76"></a>Viewing Your Partition Design 
+
+You can look up information about your partition design using the *pg\_partitions* view. For example, to see the partition design of the *sales* table:
+
+``` sql
+SELECT partitionboundary, partitiontablename, partitionname,
+partitionlevel, partitionrank
+FROM pg_partitions
+WHERE tablename='sales';
+```
+
+The following table and views show information about partitioned tables.
+
+-   *pg\_partition* - Tracks partitioned tables and their inheritance level relationships.
+-   *pg\_partition\_templates* - Shows the subpartitions created using a subpartition template.
+-   *pg\_partition\_columns* - Shows the partition key columns used in a partition design.
+
+## <a id="topic77"></a>Maintaining Partitioned Tables 
+
+To maintain a partitioned table, use the `ALTER TABLE` command against the top-level parent table. The most common scenario is to drop old partitions and add new ones to maintain a rolling window of data in a range partition design. If you have a default partition in your partition design, you add a partition by *splitting* the default partition.
+
+-   [Adding a Partition](#topic78)
+-   [Renaming a Partition](#topic79)
+-   [Adding a Default Partition](#topic80)
+-   [Dropping a Partition](#topic81)
+-   [Truncating a Partition](#topic82)
+-   [Exchanging a Partition](#topic83)
+-   [Splitting a Partition](#topic84)
+-   [Modifying a Subpartition Template](#topic85)
+
+**Note:** When using multi-level partition designs, the following operations are not supported with ALTER TABLE:
+
+-   ADD DEFAULT PARTITION
+-   ADD PARTITION
+-   DROP DEFAULT PARTITION
+-   DROP PARTITION
+-   SPLIT PARTITION
+-   All operations that involve modifying subpartitions.
+
+**Important:** When defining and altering partition designs, use the given partition name, not the table object name. Although you can query and load any table \(including partitioned tables\) directly using SQL commands, you can only modify the structure of a partitioned table using the `ALTER TABLE...PARTITION` clauses.
+
+Partitions are not required to have names. If a partition does not have a name, use one of the following expressions to specify a part: `PARTITION FOR (value)` or \)`PARTITION FOR(RANK(number)`.
+
+### <a id="topic78"></a>Adding a Partition 
+
+You can add a partition to a partition design with the `ALTER TABLE` command. If the original partition design included subpartitions defined by a *subpartition template*, the newly added partition is subpartitioned according to that template. For example:
+
+``` sql
+ALTER TABLE sales ADD PARTITION
+    START (date '2009-02-01') INCLUSIVE
+    END (date '2009-03-01') EXCLUSIVE;
+```
+
+If you did not use a subpartition template when you created the table, you define subpartitions when adding a partition:
+
+``` sql
+ALTER TABLE sales ADD PARTITION
+    START (date '2009-02-01') INCLUSIVE
+    END (date '2009-03-01') EXCLUSIVE
+     ( SUBPARTITION usa VALUES ('usa'),
+       SUBPARTITION asia VALUES ('asia'),
+       SUBPARTITION europe VALUES ('europe') );
+```
+
+When you add a subpartition to an existing partition, you can specify the partition to alter. For example:
+
+``` sql
+ALTER TABLE sales ALTER PARTITION FOR (RANK(12))
+      ADD PARTITION africa VALUES ('africa');
+```
+
+**Note:** You cannot add a partition to a partition design that has a default partition. You must split the default partition to add a partition. See [Splitting a Partition](#topic84).
+
+### <a id="topic79"></a>Renaming a Partition 
+
+Partitioned tables use the following naming convention. Partitioned subtable names are subject to uniqueness requirements and length limitations.
+
+<pre><code><i>&lt;parentname&gt;</i>_<i>&lt;level&gt;</i>_prt_<i>&lt;partition_name&gt;</i></code></pre>
+
+For example:
+
+```
+sales_1_prt_jan08
+```
+
+For auto-generated range partitions, where a number is assigned when no name is given\):
+
+```
+sales_1_prt_1
+```
+
+To rename a partitioned child table, rename the top-level parent table. The *&lt;parentname&gt;* changes in the table names of all associated child table partitions. For example, the following command:
+
+``` sql
+ALTER TABLE sales RENAME TO globalsales;
+```
+
+Changes the associated table names:
+
+```
+globalsales_1_prt_1
+```
+
+You can change the name of a partition to make it easier to identify. For example:
+
+``` sql
+ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;
+```
+
+Changes the associated table name as follows:
+
+```
+sales_1_prt_jan08
+```
+
+When altering partitioned tables with the `ALTER TABLE` command, always refer to the tables by their partition name \(*jan08*\) and not their full table name \(*sales\_1\_prt\_jan08*\).
+
+**Note:** The table name cannot be a partition name in an `ALTER TABLE` statement. For example, `ALTER TABLE sales...` is correct, `ALTER TABLE sales_1_part_jan08...` is not allowed.
+
+### <a id="topic80"></a>Adding a Default Partition 
+
+You can add a default partition to a partition design with the `ALTER TABLE` command.
+
+``` sql
+ALTER TABLE sales ADD DEFAULT PARTITION other;
+```
+
+If incoming data does not match a partition's `CHECK` constraint and there is no default partition, the data is rejected. Default partitions ensure that incoming data that does not match a partition is inserted into the default partition.
+
+### <a id="topic81"></a>Dropping a Partition 
+
+You can drop a partition from your partition design using the `ALTER TABLE` command. When you drop a partition that has subpartitions, the subpartitions \(and all data in them\) are automatically dropped as well. For range partitions, it is common to drop the older partitions from the range as old data is rolled out of the data warehouse. For example:
+
+``` sql
+ALTER TABLE sales DROP PARTITION FOR (RANK(1));
+```
+
+### <a id="topic_enm_vrk_kv"></a>Sorting AORO Partitioned Tables 
+
+HDFS read access for large numbers of append-only, row-oriented \(AORO\) tables with large numbers of partitions can be tuned by using the `optimizer_parts_to_force_sort_on_insert` parameter to control how HDFS opens files. This parameter controls the way the optimizer sorts tuples during INSERT operations, to maximize HDFS performance.
+
+The user-tunable parameter `optimizer_parts_to_force_sort_on_insert` can force the GPORCA query optimizer to generate a plan for sorting tuples during insertion into an append-only, row-oriented \(AORO\) partitioned tables. Sorting the insert tuples reduces the number of partition switches, thus improving the overall INSERT performance. For a given AORO table, if its number of leaf-partitioned tables is greater than or equal to the number specified in `optimizer_parts_to_force_sort_on_insert`, the plan generated by the GPORCA will sort inserts by their partition IDs before performing the INSERT operation. Otherwise, the inserts are not sorted. The default value for `optimizer_parts_to_force_sort_on_insert` is 160.
+
+### <a id="topic82"></a>Truncating a Partition 
+
+You can truncate a partition using the `ALTER TABLE` command. When you truncate a partition that has subpartitions, the subpartitions are automatically truncated as well.
+
+``` sql
+ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));
+```
+
+### <a id="topic83"></a>Exchanging a Partition 
+
+You can exchange a partition using the `ALTER TABLE` command. Exchanging a partition swaps one table in place of an existing partition. You can exchange partitions only at the lowest level of your partition hierarchy \(only partitions that contain data can be exchanged\).
+
+Partition exchange can be useful for data loading. For example, load a staging table and swap the loaded table into your partition design. You can use partition exchange to change the storage type of older partitions to append-only tables. For example:
+
+``` sql
+CREATE TABLE jan12 (LIKE sales) WITH (appendonly=true);
+INSERT INTO jan12 SELECT * FROM sales_1_prt_1 ;
+ALTER TABLE sales EXCHANGE PARTITION FOR (DATE '2012-01-01')
+WITH TABLE jan12;
+```
+
+**Note:** This example refers to the single-level definition of the table `sales`, before partitions were added and altered in the previous examples.
+
+### <a id="topic84"></a>Splitting a Partition 
+
+Splitting a partition divides a partition into two partitions. You can split a partition using the `ALTER TABLE` command. You can split partitions only at the lowest level of your partition hierarchy: only partitions that contain data can be split. The split value you specify goes into the *latter* partition.
+
+For example, to split a monthly partition into two with the first partition containing dates January 1-15 and the second partition containing dates January 16-31:
+
+``` sql
+ALTER TABLE sales SPLIT PARTITION FOR ('2008-01-01')
+AT ('2008-01-16')
+INTO (PARTITION jan081to15, PARTITION jan0816to31);
+```
+
+If your partition design has a default partition, you must split the default partition to add a partition.
+
+When using the `INTO` clause, specify the current default partition as the second partition name. For example, to split a default range partition to add a new monthly partition for January 2009:
+
+``` sql
+ALTER TABLE sales SPLIT DEFAULT PARTITION
+START ('2009-01-01') INCLUSIVE
+END ('2009-02-01') EXCLUSIVE
+INTO (PARTITION jan09, default partition);
+```
+
+### <a id="topic85"></a>Modifying a Subpartition Template 
+
+Use `ALTER TABLE` SET SUBPARTITION TEMPLATE to modify the subpartition template of a partitioned table. Partitions added after you set a new subpartition template have the new partition design. Existing partitions are not modified.
+
+The following example alters the subpartition template of this partitioned table:
+
+``` sql
+CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)
+  DISTRIBUTED BY (trans_id)
+  PARTITION BY RANGE (date)
+  SUBPARTITION BY LIST (region)
+  SUBPARTITION TEMPLATE
+    ( SUBPARTITION usa VALUES ('usa'),
+      SUBPARTITION asia VALUES ('asia'),
+      SUBPARTITION europe VALUES ('europe'),
+      DEFAULT SUBPARTITION other_regions )
+  ( START (date '2014-01-01') INCLUSIVE
+    END (date '2014-04-01') EXCLUSIVE
+    EVERY (INTERVAL '1 month') );
+```
+
+This `ALTER TABLE` command, modifies the subpartition template.
+
+``` sql
+ALTER TABLE sales SET SUBPARTITION TEMPLATE
+( SUBPARTITION usa VALUES ('usa'),
+  SUBPARTITION asia VALUES ('asia'),
+  SUBPARTITION europe VALUES ('europe'),
+  SUBPARTITION africa VALUES ('africa'),
+  DEFAULT SUBPARTITION regions );
+```
+
+When you add a date-range partition of the table sales, it includes the new regional list subpartition for Africa. For example, the following command creates the subpartitions `usa`, `asia`, `europe`, `africa`, and a default partition named `other`:
+
+``` sql
+ALTER TABLE sales ADD PARTITION "4"
+  START ('2014-04-01') INCLUSIVE
+  END ('2014-05-01') EXCLUSIVE ;
+```
+
+To view the tables created for the partitioned table `sales`, you can use the command `\dt sales*` from the psql command line.
+
+To remove a subpartition template, use `SET SUBPARTITION TEMPLATE` with empty parentheses. For example, to clear the sales table subpartition template:
+
+``` sql
+ALTER TABLE sales SET SUBPARTITION TEMPLATE ();
+```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/ddl/ddl-schema.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/ddl/ddl-schema.html.md.erb b/markdown/ddl/ddl-schema.html.md.erb
new file mode 100644
index 0000000..7c361ba
--- /dev/null
+++ b/markdown/ddl/ddl-schema.html.md.erb
@@ -0,0 +1,88 @@
+---
+title: Creating and Managing Schemas
+---
+
+Schemas logically organize objects and data in a database. Schemas allow you to have more than one object \(such as tables\) with the same name in the database without conflict if the objects are in different schemas.
+
+## <a id="topic18"></a>The Default "Public" Schema 
+
+Every database has a default schema named *public*. If you do not create any schemas, objects are created in the *public* schema. All database roles \(users\) have `CREATE` and `USAGE` privileges in the *public* schema. When you create a schema, you grant privileges to your users to allow access to the schema.
+
+## <a id="topic19"></a>Creating a Schema 
+
+Use the `CREATE SCHEMA` command to create a new schema. For example:
+
+``` sql
+=> CREATE SCHEMA myschema;
+```
+
+To create or access objects in a schema, write a qualified name consisting of the schema name and table name separated by a period. For example:
+
+```
+myschema.table
+```
+
+See [Schema Search Paths](#topic20) for information about accessing a schema.
+
+You can create a schema owned by someone else, for example, to restrict the activities of your users to well-defined namespaces. The syntax is:
+
+``` sql
+=> CREATE SCHEMA schemaname AUTHORIZATION username;
+```
+
+## <a id="topic20"></a>Schema Search Paths 
+
+To specify an object's location in a database, use the schema-qualified name. For example:
+
+``` sql
+=> SELECT * FROM myschema.mytable;
+```
+
+You can set the `search_path` configuration parameter to specify the order in which to search the available schemas for objects. The schema listed first in the search path becomes the *default* schema. If a schema is not specified, objects are created in the default schema.
+
+### <a id="topic21"></a>Setting the Schema Search Path 
+
+The `search_path` configuration parameter sets the schema search order. The `ALTER DATABASE` command sets the search path. For example:
+
+``` sql
+=> ALTER DATABASE mydatabase SET search_path TO myschema,
+public, pg_catalog;
+```
+
+### <a id="topic22"></a>Viewing the Current Schema 
+
+Use the `current_schema()` function to view the current schema. For example:
+
+``` sql
+=> SELECT current_schema();
+```
+
+Use the `SHOW` command to view the current search path. For example:
+
+``` sql
+=> SHOW search_path;
+```
+
+## <a id="topic23"></a>Dropping a Schema 
+
+Use the `DROP SCHEMA` command to drop \(delete\) a schema. For example:
+
+``` sql
+=> DROP SCHEMA myschema;
+```
+
+By default, the schema must be empty before you can drop it. To drop a schema and all of its objects \(tables, data, functions, and so on\) use:
+
+``` sql
+=> DROP SCHEMA myschema CASCADE;
+```
+
+## <a id="topic24"></a>System Schemas 
+
+The following system-level schemas exist in every database:
+
+-   `pg_catalog` contains the system catalog tables, built-in data types, functions, and operators. It is always part of the schema search path, even if it is not explicitly named in the search path.
+-   `information_schema` consists of a standardized set of views that contain information about the objects in the database. These views get system information from the system catalog tables in a standardized way.
+-   `pg_toast` stores large objects such as records that exceed the page size. This schema is used internally by the HAWQ system.
+-   `pg_bitmapindex` stores bitmap index objects such as lists of values. This schema is used internally by the HAWQ system.
+-   `hawq_toolkit` is an administrative schema that contains external tables, views, and functions that you can access with SQL commands. All database users can access `hawq_toolkit` to view and query the system log files and other system metrics.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/ddl/ddl-storage.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/ddl/ddl-storage.html.md.erb b/markdown/ddl/ddl-storage.html.md.erb
new file mode 100644
index 0000000..264e552
--- /dev/null
+++ b/markdown/ddl/ddl-storage.html.md.erb
@@ -0,0 +1,71 @@
+---
+title: Table Storage Model and Distribution Policy
+---
+
+HAWQ supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.
+
+**Note:** To simplify the creation of database tables, you can specify the default values for some table storage options with the HAWQ server configuration parameter `gp_default_storage_options`.
+
+## <a id="topic39"></a>Row-Oriented Storage 
+
+HAWQ provides storage orientation models of either row-oriented or Parquet tables. Evaluate performance using your own data and query workloads to determine the best alternatives.
+
+-   Row-oriented storage: good for OLTP types of workloads with many iterative transactions and many columns of a single row needed all at once, so retrieving is efficient.
+
+    **Note:** Column-oriented storage is no longer available. Parquet storage should be used, instead.
+
+Row-oriented storage provides the best options for the following situations:
+
+-   **Frequent INSERTs.** Where rows are frequently inserted into the table
+-   **Number of columns requested in queries.** Where you typically request all or the majority of columns in the `SELECT` list or `WHERE` clause of your queries, choose a row-oriented model. 
+-   **Number of columns in the table.** Row-oriented storage is most efficient when many columns are required at the same time, or when the row-size of a table is relatively small. 
+
+## <a id="topic55"></a>Altering a Table 
+
+The `ALTER TABLE`command changes the definition of a table. Use `ALTER TABLE` to change table attributes such as column definitions, distribution policy, storage model, and partition structure \(see also [Maintaining Partitioned Tables](ddl-partition.html)\). For example, to add a not-null constraint to a table column:
+
+``` sql
+=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;
+```
+
+### <a id="topic56"></a>Altering Table Distribution 
+
+`ALTER TABLE` provides options to change a table's distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.
+
+### <a id="topic57"></a>Changing the Distribution Policy 
+
+For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer\_id column as the distribution key:
+
+``` sql
+ALTER TABLE sales SET DISTRIBUTED BY (customer_id);
+```
+
+When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:
+
+``` sql
+ALTER TABLE sales SET DISTRIBUTED RANDOMLY;
+```
+
+### <a id="topic58"></a>Redistributing Table Data 
+
+To redistribute table data for tables with a random distribution policy \(or when the hash distribution policy has not changed\) use `REORGANIZE=TRUE`. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.
+
+``` sql
+ALTER TABLE sales SET WITH (REORGANIZE=TRUE);
+```
+
+## <a id="topic62"></a>Dropping a Table 
+
+The`DROP TABLE`command removes tables from the database. For example:
+
+``` sql
+DROP TABLE mytable;
+```
+
+`DROP TABLE` always removes any indexes, rules, triggers, and constraints that exist for the target table. Specify `CASCADE`to drop a table that is referenced by a view. `CASCADE` removes dependent views.
+
+To empty a table of rows without removing the table definition, use `TRUNCATE`. For example:
+
+``` sql
+TRUNCATE mytable;
+```


Mime
View raw message