hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [43/51] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs
Date Fri, 06 Jan 2017 17:32:58 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-partition.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-partition.html.md.erb b/ddl/ddl-partition.html.md.erb
deleted file mode 100644
index f790161..0000000
--- a/ddl/ddl-partition.html.md.erb
+++ /dev/null
@@ -1,483 +0,0 @@
----
-title: Partitioning Large Tables
----
-
-Table partitioning enables supporting very large tables, such as fact tables, by logically dividing them into smaller, more manageable pieces. Partitioned tables can improve query performance by allowing the HAWQ query optimizer to scan only the data needed to satisfy a given query instead of scanning all the contents of a large table.
-
-Partitioning does not change the physical distribution of table data across the segments. Table distribution is physical: HAWQ physically divides partitioned tables and non-partitioned tables across segments to enable parallel query processing. Table *partitioning* is logical: HAWQ logically divides big tables to improve query performance and facilitate data warehouse maintenance tasks, such as rolling old data out of the data warehouse.
-
-HAWQ supports:
-
--   *range partitioning*: division of data based on a numerical range, such as date or price.
--   *list partitioning*: division of data based on a list of values, such as sales territory or product line.
--   A combination of both types.
-<a id="im207241"></a>
-
-![](../mdimages/partitions.jpg "Example Multi-level Partition Design")
-
-## <a id="topic64"></a>Table Partitioning in HAWQ 
-
-HAWQ divides tables into parts \(also known as partitions\) to enable massively parallel processing. Tables are partitioned during `CREATE TABLE` using the `PARTITION BY` \(and optionally the `SUBPARTITION BY`\) clause. Partitioning creates a top-level \(or parent\) table with one or more levels of sub-tables \(or child tables\). Internally, HAWQ creates an inheritance relationship between the top-level table and its underlying partitions, similar to the functionality of the `INHERITS` clause of PostgreSQL.
-
-HAWQ uses the partition criteria defined during table creation to create each partition with a distinct `CHECK` constraint, which limits the data that table can contain. The query optimizer uses `CHECK` constraints to determine which table partitions to scan to satisfy a given query predicate.
-
-The HAWQ system catalog stores partition hierarchy information so that rows inserted into the top-level parent table propagate correctly to the child table partitions. To change the partition design or table structure, alter the parent table using `ALTER TABLE` with the `PARTITION` clause.
-
-To insert data into a partitioned table, you specify the root partitioned table, the table created with the `CREATE TABLE` command. You also can specify a leaf child table of the partitioned table in an `INSERT` command. An error is returned if the data is not valid for the specified leaf child table. Specifying a child table that is not a leaf child table in the `INSERT` command is not supported.
-
-## <a id="topic65"></a>Deciding on a Table Partitioning Strategy 
-
-Not all tables are good candidates for partitioning. If the answer is *yes* to all or most of the following questions, table partitioning is a viable database design strategy for improving query performance. If the answer is *no* to most of the following questions, table partitioning is not the right solution for that table. Test your design strategy to ensure that query performance improves as expected.
-
--   **Is the table large enough?** Large fact tables are good candidates for table partitioning. If you have millions or billions of records in a table, you may see performance benefits from logically breaking that data up into smaller chunks. For smaller tables with only a few thousand rows or less, the administrative overhead of maintaining the partitions will outweigh any performance benefits you might see.
--   **Are you experiencing unsatisfactory performance?** As with any performance tuning initiative, a table should be partitioned only if queries against that table are producing slower response times than desired.
--   **Do your query predicates have identifiable access patterns?** Examine the `WHERE` clauses of your query workload and look for table columns that are consistently used to access data. For example, if most of your queries tend to look up records by date, then a monthly or weekly date-partitioning design might be beneficial. Or if you tend to access records by region, consider a list-partitioning design to divide the table by region.
--   **Does your data warehouse maintain a window of historical data?** Another consideration for partition design is your organization's business requirements for maintaining historical data. For example, your data warehouse may require that you keep data for the past twelve months. If the data is partitioned by month, you can easily drop the oldest monthly partition from the warehouse and load current data into the most recent monthly partition.
--   **Can the data be divided into somewhat equal parts based on some defining criteria?** Choose partitioning criteria that will divide your data as evenly as possible. If the partitions contain a relatively equal number of records, query performance improves based on the number of partitions created. For example, by dividing a large table into 10 partitions, a query will execute 10 times faster than it would against the unpartitioned table, provided that the partitions are designed to support the query's criteria.
-
-Do not create more partitions than are needed. Creating too many partitions can slow down management and maintenance jobs, such as vacuuming, recovering segments, expanding the cluster, checking disk usage, and others.
-
-Partitioning does not improve query performance unless the query optimizer can eliminate partitions based on the query predicates. Queries that scan every partition run slower than if the table were not partitioned, so avoid partitioning if few of your queries achieve partition elimination. Check the explain plan for queries to make sure that partitions are eliminated. See [Query Profiling](../query/query-profiling.html) for more about partition elimination.
-
-Be very careful with multi-level partitioning because the number of partition files can grow very quickly. For example, if a table is partitioned by both day and city, and there are 1,000 days of data and 1,000 cities, the total number of partitions is one million. Column-oriented tables store each column in a physical table, so if this table has 100 columns, the system would be required to manage 100 million files for the table.
-
-Before settling on a multi-level partitioning strategy, consider a single level partition with bitmap indexes. Indexes slow down data loads, so consider performance testing with your data and schema to decide on the best strategy.
-
-## <a id="topic66"></a>Creating Partitioned Tables 
-
-You partition tables when you create them with `CREATE TABLE`. This topic provides examples of SQL syntax for creating a table with various partition designs.
-
-To partition a table:
-
-1.  Decide on the partition design: date range, numeric range, or list of values.
-2.  Choose the column\(s\) on which to partition the table.
-3.  Decide how many levels of partitions you want. For example, you can create a date range partition table by month and then subpartition the monthly partitions by sales region.
-
--   [Defining Date Range Table Partitions](#topic67)
--   [Defining Numeric Range Table Partitions](#topic68)
--   [Defining List Table Partitions](#topic69)
--   [Defining Multi-level Partitions](#topic70)
--   [Partitioning an Existing Table](#topic71)
-
-### <a id="topic67"></a>Defining Date Range Table Partitions 
-
-A date range partitioned table uses a single `date` or `timestamp` column as the partition key column. You can use the same partition key column to create subpartitions if necessary, for example, to partition by month and then subpartition by day. Consider partitioning by the most granular level. For example, for a table partitioned by date, you can partition by day and have 365 daily partitions, rather than partition by year then subpartition by month then subpartition by day. A multi-level design can reduce query planning time, but a flat partition design runs faster.
-
-You can have HAWQ automatically generate partitions by giving a `START` value, an `END` value, and an `EVERY` clause that defines the partition increment value. By default, `START` values are always inclusive and `END` values are always exclusive. For example:
-
-``` sql
-CREATE TABLE sales (id int, date date, amt decimal(10,2))
-DISTRIBUTED BY (id)
-PARTITION BY RANGE (date)
-( START (date '2008-01-01') INCLUSIVE
-   END (date '2009-01-01') EXCLUSIVE
-   EVERY (INTERVAL '1 day') );
-```
-
-You can also declare and name each partition individually. For example:
-
-``` sql
-CREATE TABLE sales (id int, date date, amt decimal(10,2))
-DISTRIBUTED BY (id)
-PARTITION BY RANGE (date)
-( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE ,
-  PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,
-  PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,
-  PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,
-  PARTITION May08 START (date '2008-05-01') INCLUSIVE ,
-  PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,
-  PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,
-  PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,
-  PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,
-  PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,
-  PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,
-  PARTITION Dec08 START (date '2008-12-01') INCLUSIVE
-                  END (date '2009-01-01') EXCLUSIVE );
-```
-
-You do not have to declare an `END` value for each partition, only the last one. In this example, `Jan08` ends where `Feb08` starts.
-
-### <a id="topic68"></a>Defining Numeric Range Table Partitions 
-
-A numeric range partitioned table uses a single numeric data type column as the partition key column. For example:
-
-``` sql
-CREATE TABLE rank (id int, rank int, year int, gender
-char(1), count int)
-DISTRIBUTED BY (id)
-PARTITION BY RANGE (year)
-( START (2001) END (2008) EVERY (1),
-  DEFAULT PARTITION extra );
-```
-
-For more information about default partitions, see [Adding a Default Partition](#topic80).
-
-### <a id="topic69"></a>Defining List Table Partitions 
-
-A list partitioned table can use any data type column that allows equality comparisons as its partition key column. A list partition can also have a multi-column \(composite\) partition key, whereas a range partition only allows a single column as the partition key. For list partitions, you must declare a partition specification for every partition \(list value\) you want to create. For example:
-
-``` sql
-CREATE TABLE rank (id int, rank int, year int, gender
-char(1), count int )
-DISTRIBUTED BY (id)
-PARTITION BY LIST (gender)
-( PARTITION girls VALUES ('F'),
-  PARTITION boys VALUES ('M'),
-  DEFAULT PARTITION other );
-```
-
-**Note:** The HAWQ legacy optimizer allows list partitions with multi-column \(composite\) partition keys. A range partition only allows a single column as the partition key. GPORCA does not support composite keys.
-
-For more information about default partitions, see [Adding a Default Partition](#topic80).
-
-### <a id="topic70"></a>Defining Multi-level Partitions 
-
-You can create a multi-level partition design with subpartitions of partitions. Using a *subpartition template* ensures that every partition has the same subpartition design, including partitions that you add later. For example, the following SQL creates the two-level partition design shown in [Figure 1](#im207241):
-
-``` sql
-CREATE TABLE sales (trans_id int, date date, amount
-decimal(9,2), region text)
-DISTRIBUTED BY (trans_id)
-PARTITION BY RANGE (date)
-SUBPARTITION BY LIST (region)
-SUBPARTITION TEMPLATE
-( SUBPARTITION usa VALUES ('usa'),
-  SUBPARTITION asia VALUES ('asia'),
-  SUBPARTITION europe VALUES ('europe'),
-  DEFAULT SUBPARTITION other_regions)
-  (START (date '2011-01-01') INCLUSIVE
-   END (date '2012-01-01') EXCLUSIVE
-   EVERY (INTERVAL '1 month'),
-   DEFAULT PARTITION outlying_dates );
-```
-
-The following example shows a three-level partition design where the `sales` table is partitioned by `year`, then `month`, then `region`. The `SUBPARTITION TEMPLATE` clauses ensure that each yearly partition has the same subpartition structure. The example declares a `DEFAULT` partition at each level of the hierarchy.
-
-``` sql
-CREATE TABLE p3_sales (id int, year int, month int, day int,
-region text)
-DISTRIBUTED BY (id)
-PARTITION BY RANGE (year)
-    SUBPARTITION BY RANGE (month)
-      SUBPARTITION TEMPLATE (
-        START (1) END (13) EVERY (1),
-        DEFAULT SUBPARTITION other_months )
-           SUBPARTITION BY LIST (region)
-             SUBPARTITION TEMPLATE (
-               SUBPARTITION usa VALUES ('usa'),
-               SUBPARTITION europe VALUES ('europe'),
-               SUBPARTITION asia VALUES ('asia'),
-               DEFAULT SUBPARTITION other_regions )
-( START (2002) END (2012) EVERY (1),
-  DEFAULT PARTITION outlying_years );
-```
-
-**CAUTION**:
-
-When you create multi-level partitions on ranges, it is easy to create a large number of subpartitions, some containing little or no data. This can add many entries to the system tables, which increases the time and memory required to optimize and execute queries. Increase the range interval or choose a different partitioning strategy to reduce the number of subpartitions created.
-
-### <a id="topic71"></a>Partitioning an Existing Table 
-
-Tables can be partitioned only at creation. If you have a table that you want to partition, you must create a partitioned table, load the data from the original table into the new table, drop the original table, and rename the partitioned table with the original table's name. You must also re-grant any table permissions. For example:
-
-``` sql
-CREATE TABLE sales2 (LIKE sales)
-PARTITION BY RANGE (date)
-( START (date '2008-01-01') INCLUSIVE
-   END (date '2009-01-01') EXCLUSIVE
-   EVERY (INTERVAL '1 month') );
-INSERT INTO sales2 SELECT * FROM sales;
-DROP TABLE sales;
-ALTER TABLE sales2 RENAME TO sales;
-GRANT ALL PRIVILEGES ON sales TO admin;
-GRANT SELECT ON sales TO guest;
-```
-
-## <a id="topic73"></a>Loading Partitioned Tables 
-
-After you create the partitioned table structure, top-level parent tables are empty. Data is routed to the bottom-level child table partitions. In a multi-level partition design, only the subpartitions at the bottom of the hierarchy can contain data.
-
-Rows that cannot be mapped to a child table partition are rejected and the load fails. To avoid unmapped rows being rejected at load time, define your partition hierarchy with a `DEFAULT` partition. Any rows that do not match a partition's `CHECK` constraints load into the `DEFAULT` partition. See [Adding a Default Partition](#topic80).
-
-At runtime, the query optimizer scans the entire table inheritance hierarchy and uses the `CHECK` table constraints to determine which of the child table partitions to scan to satisfy the query's conditions. The `DEFAULT` partition \(if your hierarchy has one\) is always scanned. `DEFAULT` partitions that contain data slow down the overall scan time.
-
-When you use `COPY` or `INSERT` to load data into a parent table, the data is automatically rerouted to the correct partition, just like a regular table.
-
-Best practice for loading data into partitioned tables is to create an intermediate staging table, load it, and then exchange it into your partition design. See [Exchanging a Partition](#topic83).
-
-## <a id="topic74"></a>Verifying Your Partition Strategy 
-
-When a table is partitioned based on the query predicate, you can use `EXPLAIN` to verify that the query optimizer scans only the relevant data to examine the query plan.
-
-For example, suppose a *sales* table is date-range partitioned by month and subpartitioned by region as shown in [Figure 1](#im207241). For the following query:
-
-``` sql
-EXPLAIN SELECT * FROM sales WHERE date='01-07-12' AND
-region='usa';
-```
-
-The query plan for this query should show a table scan of only the following tables:
-
--   the default partition returning 0-1 rows \(if your partition design has one\)
--   the January 2012 partition \(*sales\_1\_prt\_1*\) returning 0-1 rows
--   the USA region subpartition \(*sales\_1\_2\_prt\_usa*\) returning *some number* of rows.
-
-The following example shows the relevant portion of the query plan.
-
-``` pre
-->  `Seq Scan on``sales_1_prt_1` sales (cost=0.00..0.00 `rows=0`
-     width=0)
-Filter: "date"=01-07-08::date AND region='USA'::text
-->  `Seq Scan on``sales_1_2_prt_usa` sales (cost=0.00..9.87
-`rows=20`
-      width=40)
-```
-
-Ensure that the query optimizer does not scan unnecessary partitions or subpartitions \(for example, scans of months or regions not specified in the query predicate\), and that scans of the top-level tables return 0-1 rows.
-
-### <a id="topic75"></a>Troubleshooting Selective Partition Scanning 
-
-The following limitations can result in a query plan that shows a non-selective scan of your partition hierarchy.
-
--   The query optimizer can selectively scan partitioned tables only when the query contains a direct and simple restriction of the table using immutable operators such as:
-
-    =, < , <= , \>,  \>= , and <\>
-
--   Selective scanning recognizes `STABLE` and `IMMUTABLE` functions, but does not recognize `VOLATILE` functions within a query. For example, `WHERE` clauses such as `date > CURRENT_DATE` cause the query optimizer to selectively scan partitioned tables, but `time > TIMEOFDAY` does not.
-
-## <a id="topic76"></a>Viewing Your Partition Design 
-
-You can look up information about your partition design using the *pg\_partitions* view. For example, to see the partition design of the *sales* table:
-
-``` sql
-SELECT partitionboundary, partitiontablename, partitionname,
-partitionlevel, partitionrank
-FROM pg_partitions
-WHERE tablename='sales';
-```
-
-The following table and views show information about partitioned tables.
-
--   *pg\_partition* - Tracks partitioned tables and their inheritance level relationships.
--   *pg\_partition\_templates* - Shows the subpartitions created using a subpartition template.
--   *pg\_partition\_columns* - Shows the partition key columns used in a partition design.
-
-## <a id="topic77"></a>Maintaining Partitioned Tables 
-
-To maintain a partitioned table, use the `ALTER TABLE` command against the top-level parent table. The most common scenario is to drop old partitions and add new ones to maintain a rolling window of data in a range partition design. If you have a default partition in your partition design, you add a partition by *splitting* the default partition.
-
--   [Adding a Partition](#topic78)
--   [Renaming a Partition](#topic79)
--   [Adding a Default Partition](#topic80)
--   [Dropping a Partition](#topic81)
--   [Truncating a Partition](#topic82)
--   [Exchanging a Partition](#topic83)
--   [Splitting a Partition](#topic84)
--   [Modifying a Subpartition Template](#topic85)
-
-**Note:** When using multi-level partition designs, the following operations are not supported with ALTER TABLE:
-
--   ADD DEFAULT PARTITION
--   ADD PARTITION
--   DROP DEFAULT PARTITION
--   DROP PARTITION
--   SPLIT PARTITION
--   All operations that involve modifying subpartitions.
-
-**Important:** When defining and altering partition designs, use the given partition name, not the table object name. Although you can query and load any table \(including partitioned tables\) directly using SQL commands, you can only modify the structure of a partitioned table using the `ALTER TABLE...PARTITION` clauses.
-
-Partitions are not required to have names. If a partition does not have a name, use one of the following expressions to specify a part: `PARTITION FOR (value)` or \)`PARTITION FOR(RANK(number)`.
-
-### <a id="topic78"></a>Adding a Partition 
-
-You can add a partition to a partition design with the `ALTER TABLE` command. If the original partition design included subpartitions defined by a *subpartition template*, the newly added partition is subpartitioned according to that template. For example:
-
-``` sql
-ALTER TABLE sales ADD PARTITION
-    START (date '2009-02-01') INCLUSIVE
-    END (date '2009-03-01') EXCLUSIVE;
-```
-
-If you did not use a subpartition template when you created the table, you define subpartitions when adding a partition:
-
-``` sql
-ALTER TABLE sales ADD PARTITION
-    START (date '2009-02-01') INCLUSIVE
-    END (date '2009-03-01') EXCLUSIVE
-     ( SUBPARTITION usa VALUES ('usa'),
-       SUBPARTITION asia VALUES ('asia'),
-       SUBPARTITION europe VALUES ('europe') );
-```
-
-When you add a subpartition to an existing partition, you can specify the partition to alter. For example:
-
-``` sql
-ALTER TABLE sales ALTER PARTITION FOR (RANK(12))
-      ADD PARTITION africa VALUES ('africa');
-```
-
-**Note:** You cannot add a partition to a partition design that has a default partition. You must split the default partition to add a partition. See [Splitting a Partition](#topic84).
-
-### <a id="topic79"></a>Renaming a Partition 
-
-Partitioned tables use the following naming convention. Partitioned subtable names are subject to uniqueness requirements and length limitations.
-
-<pre><code><i>&lt;parentname&gt;</i>_<i>&lt;level&gt;</i>_prt_<i>&lt;partition_name&gt;</i></code></pre>
-
-For example:
-
-```
-sales_1_prt_jan08
-```
-
-For auto-generated range partitions, where a number is assigned when no name is given\):
-
-```
-sales_1_prt_1
-```
-
-To rename a partitioned child table, rename the top-level parent table. The *&lt;parentname&gt;* changes in the table names of all associated child table partitions. For example, the following command:
-
-``` sql
-ALTER TABLE sales RENAME TO globalsales;
-```
-
-Changes the associated table names:
-
-```
-globalsales_1_prt_1
-```
-
-You can change the name of a partition to make it easier to identify. For example:
-
-``` sql
-ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;
-```
-
-Changes the associated table name as follows:
-
-```
-sales_1_prt_jan08
-```
-
-When altering partitioned tables with the `ALTER TABLE` command, always refer to the tables by their partition name \(*jan08*\) and not their full table name \(*sales\_1\_prt\_jan08*\).
-
-**Note:** The table name cannot be a partition name in an `ALTER TABLE` statement. For example, `ALTER TABLE sales...` is correct, `ALTER TABLE sales_1_part_jan08...` is not allowed.
-
-### <a id="topic80"></a>Adding a Default Partition 
-
-You can add a default partition to a partition design with the `ALTER TABLE` command.
-
-``` sql
-ALTER TABLE sales ADD DEFAULT PARTITION other;
-```
-
-If incoming data does not match a partition's `CHECK` constraint and there is no default partition, the data is rejected. Default partitions ensure that incoming data that does not match a partition is inserted into the default partition.
-
-### <a id="topic81"></a>Dropping a Partition 
-
-You can drop a partition from your partition design using the `ALTER TABLE` command. When you drop a partition that has subpartitions, the subpartitions \(and all data in them\) are automatically dropped as well. For range partitions, it is common to drop the older partitions from the range as old data is rolled out of the data warehouse. For example:
-
-``` sql
-ALTER TABLE sales DROP PARTITION FOR (RANK(1));
-```
-
-### <a id="topic_enm_vrk_kv"></a>Sorting AORO Partitioned Tables 
-
-HDFS read access for large numbers of append-only, row-oriented \(AORO\) tables with large numbers of partitions can be tuned by using the `optimizer_parts_to_force_sort_on_insert` parameter to control how HDFS opens files. This parameter controls the way the optimizer sorts tuples during INSERT operations, to maximize HDFS performance.
-
-The user-tunable parameter `optimizer_parts_to_force_sort_on_insert` can force the GPORCA query optimizer to generate a plan for sorting tuples during insertion into an append-only, row-oriented \(AORO\) partitioned tables. Sorting the insert tuples reduces the number of partition switches, thus improving the overall INSERT performance. For a given AORO table, if its number of leaf-partitioned tables is greater than or equal to the number specified in `optimizer_parts_to_force_sort_on_insert`, the plan generated by the GPORCA will sort inserts by their partition IDs before performing the INSERT operation. Otherwise, the inserts are not sorted. The default value for `optimizer_parts_to_force_sort_on_insert` is 160.
-
-### <a id="topic82"></a>Truncating a Partition 
-
-You can truncate a partition using the `ALTER TABLE` command. When you truncate a partition that has subpartitions, the subpartitions are automatically truncated as well.
-
-``` sql
-ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));
-```
-
-### <a id="topic83"></a>Exchanging a Partition 
-
-You can exchange a partition using the `ALTER TABLE` command. Exchanging a partition swaps one table in place of an existing partition. You can exchange partitions only at the lowest level of your partition hierarchy \(only partitions that contain data can be exchanged\).
-
-Partition exchange can be useful for data loading. For example, load a staging table and swap the loaded table into your partition design. You can use partition exchange to change the storage type of older partitions to append-only tables. For example:
-
-``` sql
-CREATE TABLE jan12 (LIKE sales) WITH (appendonly=true);
-INSERT INTO jan12 SELECT * FROM sales_1_prt_1 ;
-ALTER TABLE sales EXCHANGE PARTITION FOR (DATE '2012-01-01')
-WITH TABLE jan12;
-```
-
-**Note:** This example refers to the single-level definition of the table `sales`, before partitions were added and altered in the previous examples.
-
-### <a id="topic84"></a>Splitting a Partition 
-
-Splitting a partition divides a partition into two partitions. You can split a partition using the `ALTER TABLE` command. You can split partitions only at the lowest level of your partition hierarchy: only partitions that contain data can be split. The split value you specify goes into the *latter* partition.
-
-For example, to split a monthly partition into two with the first partition containing dates January 1-15 and the second partition containing dates January 16-31:
-
-``` sql
-ALTER TABLE sales SPLIT PARTITION FOR ('2008-01-01')
-AT ('2008-01-16')
-INTO (PARTITION jan081to15, PARTITION jan0816to31);
-```
-
-If your partition design has a default partition, you must split the default partition to add a partition.
-
-When using the `INTO` clause, specify the current default partition as the second partition name. For example, to split a default range partition to add a new monthly partition for January 2009:
-
-``` sql
-ALTER TABLE sales SPLIT DEFAULT PARTITION
-START ('2009-01-01') INCLUSIVE
-END ('2009-02-01') EXCLUSIVE
-INTO (PARTITION jan09, default partition);
-```
-
-### <a id="topic85"></a>Modifying a Subpartition Template 
-
-Use `ALTER TABLE` SET SUBPARTITION TEMPLATE to modify the subpartition template of a partitioned table. Partitions added after you set a new subpartition template have the new partition design. Existing partitions are not modified.
-
-The following example alters the subpartition template of this partitioned table:
-
-``` sql
-CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)
-  DISTRIBUTED BY (trans_id)
-  PARTITION BY RANGE (date)
-  SUBPARTITION BY LIST (region)
-  SUBPARTITION TEMPLATE
-    ( SUBPARTITION usa VALUES ('usa'),
-      SUBPARTITION asia VALUES ('asia'),
-      SUBPARTITION europe VALUES ('europe'),
-      DEFAULT SUBPARTITION other_regions )
-  ( START (date '2014-01-01') INCLUSIVE
-    END (date '2014-04-01') EXCLUSIVE
-    EVERY (INTERVAL '1 month') );
-```
-
-This `ALTER TABLE` command, modifies the subpartition template.
-
-``` sql
-ALTER TABLE sales SET SUBPARTITION TEMPLATE
-( SUBPARTITION usa VALUES ('usa'),
-  SUBPARTITION asia VALUES ('asia'),
-  SUBPARTITION europe VALUES ('europe'),
-  SUBPARTITION africa VALUES ('africa'),
-  DEFAULT SUBPARTITION regions );
-```
-
-When you add a date-range partition of the table sales, it includes the new regional list subpartition for Africa. For example, the following command creates the subpartitions `usa`, `asia`, `europe`, `africa`, and a default partition named `other`:
-
-``` sql
-ALTER TABLE sales ADD PARTITION "4"
-  START ('2014-04-01') INCLUSIVE
-  END ('2014-05-01') EXCLUSIVE ;
-```
-
-To view the tables created for the partitioned table `sales`, you can use the command `\dt sales*` from the psql command line.
-
-To remove a subpartition template, use `SET SUBPARTITION TEMPLATE` with empty parentheses. For example, to clear the sales table subpartition template:
-
-``` sql
-ALTER TABLE sales SET SUBPARTITION TEMPLATE ();
-```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-schema.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-schema.html.md.erb b/ddl/ddl-schema.html.md.erb
deleted file mode 100644
index 7c361ba..0000000
--- a/ddl/ddl-schema.html.md.erb
+++ /dev/null
@@ -1,88 +0,0 @@
----
-title: Creating and Managing Schemas
----
-
-Schemas logically organize objects and data in a database. Schemas allow you to have more than one object \(such as tables\) with the same name in the database without conflict if the objects are in different schemas.
-
-## <a id="topic18"></a>The Default "Public" Schema 
-
-Every database has a default schema named *public*. If you do not create any schemas, objects are created in the *public* schema. All database roles \(users\) have `CREATE` and `USAGE` privileges in the *public* schema. When you create a schema, you grant privileges to your users to allow access to the schema.
-
-## <a id="topic19"></a>Creating a Schema 
-
-Use the `CREATE SCHEMA` command to create a new schema. For example:
-
-``` sql
-=> CREATE SCHEMA myschema;
-```
-
-To create or access objects in a schema, write a qualified name consisting of the schema name and table name separated by a period. For example:
-
-```
-myschema.table
-```
-
-See [Schema Search Paths](#topic20) for information about accessing a schema.
-
-You can create a schema owned by someone else, for example, to restrict the activities of your users to well-defined namespaces. The syntax is:
-
-``` sql
-=> CREATE SCHEMA schemaname AUTHORIZATION username;
-```
-
-## <a id="topic20"></a>Schema Search Paths 
-
-To specify an object's location in a database, use the schema-qualified name. For example:
-
-``` sql
-=> SELECT * FROM myschema.mytable;
-```
-
-You can set the `search_path` configuration parameter to specify the order in which to search the available schemas for objects. The schema listed first in the search path becomes the *default* schema. If a schema is not specified, objects are created in the default schema.
-
-### <a id="topic21"></a>Setting the Schema Search Path 
-
-The `search_path` configuration parameter sets the schema search order. The `ALTER DATABASE` command sets the search path. For example:
-
-``` sql
-=> ALTER DATABASE mydatabase SET search_path TO myschema,
-public, pg_catalog;
-```
-
-### <a id="topic22"></a>Viewing the Current Schema 
-
-Use the `current_schema()` function to view the current schema. For example:
-
-``` sql
-=> SELECT current_schema();
-```
-
-Use the `SHOW` command to view the current search path. For example:
-
-``` sql
-=> SHOW search_path;
-```
-
-## <a id="topic23"></a>Dropping a Schema 
-
-Use the `DROP SCHEMA` command to drop \(delete\) a schema. For example:
-
-``` sql
-=> DROP SCHEMA myschema;
-```
-
-By default, the schema must be empty before you can drop it. To drop a schema and all of its objects \(tables, data, functions, and so on\) use:
-
-``` sql
-=> DROP SCHEMA myschema CASCADE;
-```
-
-## <a id="topic24"></a>System Schemas 
-
-The following system-level schemas exist in every database:
-
--   `pg_catalog` contains the system catalog tables, built-in data types, functions, and operators. It is always part of the schema search path, even if it is not explicitly named in the search path.
--   `information_schema` consists of a standardized set of views that contain information about the objects in the database. These views get system information from the system catalog tables in a standardized way.
--   `pg_toast` stores large objects such as records that exceed the page size. This schema is used internally by the HAWQ system.
--   `pg_bitmapindex` stores bitmap index objects such as lists of values. This schema is used internally by the HAWQ system.
--   `hawq_toolkit` is an administrative schema that contains external tables, views, and functions that you can access with SQL commands. All database users can access `hawq_toolkit` to view and query the system log files and other system metrics.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-storage.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-storage.html.md.erb b/ddl/ddl-storage.html.md.erb
deleted file mode 100644
index 264e552..0000000
--- a/ddl/ddl-storage.html.md.erb
+++ /dev/null
@@ -1,71 +0,0 @@
----
-title: Table Storage Model and Distribution Policy
----
-
-HAWQ supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.
-
-**Note:** To simplify the creation of database tables, you can specify the default values for some table storage options with the HAWQ server configuration parameter `gp_default_storage_options`.
-
-## <a id="topic39"></a>Row-Oriented Storage 
-
-HAWQ provides storage orientation models of either row-oriented or Parquet tables. Evaluate performance using your own data and query workloads to determine the best alternatives.
-
--   Row-oriented storage: good for OLTP types of workloads with many iterative transactions and many columns of a single row needed all at once, so retrieving is efficient.
-
-    **Note:** Column-oriented storage is no longer available. Parquet storage should be used, instead.
-
-Row-oriented storage provides the best options for the following situations:
-
--   **Frequent INSERTs.** Where rows are frequently inserted into the table
--   **Number of columns requested in queries.** Where you typically request all or the majority of columns in the `SELECT` list or `WHERE` clause of your queries, choose a row-oriented model. 
--   **Number of columns in the table.** Row-oriented storage is most efficient when many columns are required at the same time, or when the row-size of a table is relatively small. 
-
-## <a id="topic55"></a>Altering a Table 
-
-The `ALTER TABLE`command changes the definition of a table. Use `ALTER TABLE` to change table attributes such as column definitions, distribution policy, storage model, and partition structure \(see also [Maintaining Partitioned Tables](ddl-partition.html)\). For example, to add a not-null constraint to a table column:
-
-``` sql
-=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;
-```
-
-### <a id="topic56"></a>Altering Table Distribution 
-
-`ALTER TABLE` provides options to change a table's distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.
-
-### <a id="topic57"></a>Changing the Distribution Policy 
-
-For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer\_id column as the distribution key:
-
-``` sql
-ALTER TABLE sales SET DISTRIBUTED BY (customer_id);
-```
-
-When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:
-
-``` sql
-ALTER TABLE sales SET DISTRIBUTED RANDOMLY;
-```
-
-### <a id="topic58"></a>Redistributing Table Data 
-
-To redistribute table data for tables with a random distribution policy \(or when the hash distribution policy has not changed\) use `REORGANIZE=TRUE`. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.
-
-``` sql
-ALTER TABLE sales SET WITH (REORGANIZE=TRUE);
-```
-
-## <a id="topic62"></a>Dropping a Table 
-
-The`DROP TABLE`command removes tables from the database. For example:
-
-``` sql
-DROP TABLE mytable;
-```
-
-`DROP TABLE` always removes any indexes, rules, triggers, and constraints that exist for the target table. Specify `CASCADE`to drop a table that is referenced by a view. `CASCADE` removes dependent views.
-
-To empty a table of rows without removing the table definition, use `TRUNCATE`. For example:
-
-``` sql
-TRUNCATE mytable;
-```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-table.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-table.html.md.erb b/ddl/ddl-table.html.md.erb
deleted file mode 100644
index bc4f0c4..0000000
--- a/ddl/ddl-table.html.md.erb
+++ /dev/null
@@ -1,149 +0,0 @@
----
-title: Creating and Managing Tables
----
-
-HAWQ Tables are similar to tables in any relational database, except that table rows are distributed across the different segments in the system. When you create a table, you specify the table's distribution policy.
-
-## <a id="topic26"></a>Creating a Table 
-
-The `CREATE TABLE` command creates a table and defines its structure. When you create a table, you define:
-
--   The columns of the table and their associated data types. See [Choosing Column Data Types](#topic27).
--   Any table constraints to limit the data that a column or table can contain. See [Setting Table Constraints](#topic28).
--   The distribution policy of the table, which determines how HAWQ divides data is across the segments. See [Choosing the Table Distribution Policy](#topic34).
--   The way the table is stored on disk.
--   The table partitioning strategy for large tables, which specifies how the data should be divided. See [Creating and Managing Databases](../ddl/ddl-database.html).
-
-### <a id="topic27"></a>Choosing Column Data Types 
-
-The data type of a column determines the types of data values the column can contain. Choose the data type that uses the least possible space but can still accommodate your data and that best constrains the data. For example, use character data types for strings, date or timestamp data types for dates, and numeric data types for numbers.
-
-There are no performance differences among the character data types `CHAR`, `VARCHAR`, and `TEXT` apart from the increased storage size when you use the blank-padded type. In most situations, use `TEXT` or `VARCHAR` rather than `CHAR`.
-
-Use the smallest numeric data type that will accommodate your numeric data and allow for future expansion. For example, using `BIGINT` for data that fits in `INT` or `SMALLINT` wastes storage space. If you expect that your data values will expand over time, consider that changing from a smaller datatype to a larger datatype after loading large amounts of data is costly. For example, if your current data values fit in a `SMALLINT` but it is likely that the values will expand, `INT` is the better long-term choice.
-
-Use the same data types for columns that you plan to use in cross-table joins. When the data types are different, the database must convert one of them so that the data values can be compared correctly, which adds unnecessary overhead.
-
-HAWQ supports the parquet columnar storage format, which can increase performance on large queries. Use parquet tables for HAWQ internal tables.
-
-### <a id="topic28"></a>Setting Table Constraints 
-
-You can define constraints to restrict the data in your tables. HAWQ support for constraints is the same as PostgreSQL with some limitations, including:
-
--   `CHECK` constraints can refer only to the table on which they are defined.
--   `FOREIGN KEY` constraints are allowed, but not enforced.
--   Constraints that you define on partitioned tables apply to the partitioned table as a whole. You cannot define constraints on the individual parts of the table.
-
-#### <a id="topic29"></a>Check Constraints 
-
-Check constraints allow you to specify that the value in a certain column must satisfy a Boolean \(truth-value\) expression. For example, to require positive product prices:
-
-``` sql
-=> CREATE TABLE products
-     ( product_no integer,
-       name text,
-       price numeric CHECK (price > 0) );
-```
-
-#### <a id="topic30"></a>Not-Null Constraints 
-
-Not-null constraints specify that a column must not assume the null value. A not-null constraint is always written as a column constraint. For example:
-
-``` sql
-=> CREATE TABLE products
-     ( product_no integer NOT NULL,
-       name text NOT NULL,
-       price numeric );
-```
-
-#### <a id="topic33"></a>Foreign Keys 
-
-Foreign keys are not supported. You can declare them, but referential integrity is not enforced.
-
-Foreign key constraints specify that the values in a column or a group of columns must match the values appearing in some row of another table to maintain referential integrity between two related tables. Referential integrity checks cannot be enforced between the distributed table segments of a HAWQ database.
-
-### <a id="topic34"></a>Choosing the Table Distribution Policy 
-
-All HAWQ tables are distributed. The default is `DISTRIBUTED RANDOMLY` \(round-robin distribution\) to determine the table row distribution. However, when you create or alter a table, you can optionally specify `DISTRIBUTED BY` to distribute data according to a hash-based policy. In this case, the `bucketnum` attribute sets the number of hash buckets used by a hash-distributed table. Columns of geometric or user-defined data types are not eligible as HAWQ distribution key columns. 
-
-Randomly distributed tables have benefits over hash distributed tables. For example, after expansion, HAWQ's elasticity feature lets it automatically use more resources without needing to redistribute the data. For extremely large tables, redistribution is very expensive. Also, data locality for randomly distributed tables is better, especially after the underlying HDFS redistributes its data during rebalancing or because of DataNode failures. This is quite common when the cluster is large.
-
-However, hash distributed tables can be faster than randomly distributed tables. For example, for TPCH queries, where there are several queries, HASH distributed tables can have performance benefits. Choose a distribution policy that best suits your application scenario. When you `CREATE TABLE`, you can also specify the `bucketnum` option. The `bucketnum` determines the number of hash buckets used in creating a hash-distributed table or for PXF external table intermediate processing. The number of buckets also affects how many virtual segments will be created when processing this data. The bucketnumber of a gpfdist external table is the number of gpfdist location, and the bucketnumber of a command external table is `ON #num`. PXF external tables use the `default_hash_table_bucket_number` parameter to control virtual segments. 
-
-HAWQ's elastic execution runtime is based on virtual segments, which are allocated on demand, based on the cost of the query. Each node uses one physical segment and a number of dynamically allocated virtual segments distributed to different hosts, thus simplifying performance tuning. Large queries use large numbers of virtual segments, while smaller queries use fewer virtual segments. Tables do not need to be redistributed when nodes are added or removed.
-
-In general, the more virtual segments are used, the faster the query will be executed. You can tune the parameters for `default_hash_table_bucket_number` and `hawq_rm_nvseg_perquery_limit` to adjust performance by controlling the number of virtual segments used for a query. However, be aware that if the value of `default_hash_table_bucket_number` is changed, data must be redistributed, which can be costly. Therefore, it is better to set the `default_hash_table_bucket_number` up front, if you expect to need a larger number of virtual segments. However, you might need to adjust the value in `default_hash_table_bucket_number` after cluster expansion, but should take care not to exceed the number of virtual segments per query set in `hawq_rm_nvseg_perquery_limit`. Refer to the recommended guidelines for setting the value of `default_hash_table_bucket_number`, later in this section.
-
-For random or gpfdist external tables, as well as user-defined functions, the value set in the `hawq_rm_nvseg_perquery_perseg_limit` parameter limits the number of virtual segments that are used for one segment for one query, to optimize query resources. Resetting this parameter is not recommended.
-
-Consider the following points when deciding on a table distribution policy.
-
--   **Even Data Distribution** — For the best possible performance, all segments should contain equal portions of data. If the data is unbalanced or skewed, the segments with more data must work harder to perform their portion of the query processing.
--   **Local and Distributed Operations** — Local operations are faster than distributed operations. Query processing is fastest if the work associated with join, sort, or aggregation operations is done locally, at the segment level. Work done at the system level requires distributing tuples across the segments, which is less efficient. When tables share a common distribution key, the work of joining or sorting on their shared distribution key columns is done locally. With a random distribution policy, local join operations are not an option.
--   **Even Query Processing** — For best performance, all segments should handle an equal share of the query workload. Query workload can be skewed if a table's data distribution policy and the query predicates are not well matched. For example, suppose that a sales transactions table is distributed based on a column that contains corporate names \(the distribution key\), and the hashing algorithm distributes the data based on those values. If a predicate in a query references a single value from the distribution key, query processing runs on only one segment. This works if your query predicates usually select data on a criteria other than corporation name. For queries that use corporation name in their predicates, it's possible that only one segment instance will handle the query workload.
-
-HAWQ utilizes dynamic parallelism, which can affect the performance of a query execution significantly. Performance depends on the following factors:
-
--   The size of a randomly distributed table.
--   The `bucketnum` of a hash distributed table.
--   Data locality.
--   The values of `default_hash_table_bucket_number`, and `hawq_rm_nvseg_perquery_limit` \(including defaults and user-defined values\).
-
-For any specific query, the first four factors are fixed values, while the configuration parameters in the last item can be used to tune performance of the query execution. In querying a random table, the query resource load is related to the data size of the table, usually one virtual segment for one HDFS block. As a result, querying a large table could use a large number of resources.
-
-The `bucketnum` for a hash table specifies the number of hash buckets to be used in creating virtual segments. A HASH distributed table is created with `default_hash_table_bucket_number` buckets. The default bucket value can be changed in session level or in the `CREATE TABLE` DDL by using the `bucketnum` storage parameter.
-
-In an Ambari-managed HAWQ cluster, the default bucket number \(`default_hash_table_bucket_number`\) is derived from the number of segment nodes. In command-line-managed HAWQ environments, you can use the `--bucket_number` option of `hawq init` to explicitly set `default_hash_table_bucket_number` during cluster initialization.
-
-**Note:** For best performance with large tables, the number of buckets should not exceed the value of the `default_hash_table_bucket_number` parameter. Small tables can use one segment node, `WITH bucketnum=1`. For larger tables, the `bucketnum` is set to a multiple of the number of segment nodes, for the best load balancing on different segment nodes. The elastic runtime will attempt to find the optimal number of buckets for the number of nodes being processed. Larger tables need more virtual segments, and hence use larger numbers of buckets.
-
-The following statement creates a table “sales” with 8 buckets, which would be similar to a hash-distributed table on 8 segments.
-
-``` sql
-=> CREATE TABLE sales(id int, profit float)  WITH (bucketnum=8) DISTRIBUTED BY (id);
-```
-
-There are four ways of creating a table from an origin table. The ways in which the new table is generated from the original table are listed below.
-
-<table>
-  <tr>
-    <th></th>
-    <th>Syntax</th>
-  </tr>
-  <tr><td>INHERITS</td><td><pre><code>CREATE TABLE new_table INHERITS (origintable) [WITH(bucketnum=x)] <br/>[DISTRIBUTED BY col]</code></pre></td></tr>
-  <tr><td>LIKE</td><td><pre><code>CREATE TABLE new_table (LIKE origintable) [WITH(bucketnum=x)] <br/>[DISTRIBUTED BY col]</code></pre></td></tr>
-  <tr><td>AS</td><td><pre><code>CREATE TABLE new_table [WITH(bucketnum=x)] AS SUBQUERY [DISTRIBUTED BY col]</code></pre></td></tr>
-  <tr><td>SELECT INTO</td><td><pre><code>CREATE TABLE origintable [WITH(bucketnum=x)] [DISTRIBUTED BY col]; SELECT * <br/>INTO new_table FROM origintable;</code></pre></td></tr>
-</table>
-
-The optional `INHERITS` clause specifies a list of tables from which the new table automatically inherits all columns. Hash tables inherit bucketnumbers from their origin table if not otherwise specified. If `WITH` specifies `bucketnum` in creating a hash-distributed table, it will be copied. If distribution is specified by column, the table will inherit it. Otherwise, the table will use default distribution from `default_hash_table_bucket_number`.
-
-The `LIKE` clause specifies a table from which the new table automatically copies all column names, data types, not-null constraints, and distribution policy. If a `bucketnum` is specified, it will be copied. Otherwise, the table will use default distribution.
-
-For hash tables, the `SELECT INTO` function always uses random distribution.
-
-#### <a id="topic_kjg_tqm_gv"></a>Declaring Distribution Keys 
-
-`CREATE TABLE`'s optional clause `DISTRIBUTED BY` specifies the distribution policy for a table. The default is a random distribution policy. You can also choose to distribute data as a hash-based policy, where the `bucketnum` attribute sets the number of hash buckets used by a hash-distributed table. HASH distributed tables are created with the number of hash buckets specified by the `default_hash_table_bucket_number` parameter.
-
-Policies for different application scenarios can be specified to optimize performance. The number of virtual segments used for query execution can now be tuned using the `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, in connection with the `default_hash_table_bucket_number` parameter, which sets the default `bucketnum`. For more information, see the guidelines for Virtual Segments in the next section and in [Query Performance](../query/query-performance.html#topic38).
-
-#### <a id="topic_wff_mqm_gv"></a>Performance Tuning 
-
-Adjusting the values of the configuration parameters `default_hash_table_bucket_number` and `hawq_rm_nvseg_perquery_limit` can tune performance by controlling the number of virtual segments being used. In most circumstances, HAWQ's elastic runtime will dynamically allocate virtual segments to optimize performance, so further tuning should not be needed..
-
-Hash tables are created using the value specified in `default_hash_table_bucket_number`. Queries for hash tables use a fixed number of buckets, regardless of the amount of data present. Explicitly setting `default_hash_table_bucket_number` can be useful in managing resources. If you desire a larger or smaller number of hash buckets, set this value before you create tables. Resources are dynamically allocated to a multiple of the number of nodes. If you use `hawq init --bucket_number` to set the value of `default_hash_table_bucket_number` during cluster initialization or expansion, the value should not exceed the value of `hawq_rm_nvseg_perquery_limit`. This server parameter defines the maximum number of virtual segments that can be used for a query \(default = 512, with a maximum of 65535\). Modifying the value to greater than 1000 segments is not recommended.
-
-The following per-node guidelines apply to values for `default_hash_table_bucket_number`.
-
-|Number of Nodes|default\_hash\_table\_bucket\_number value|
-|---------------|------------------------------------------|
-|<= 85|6 \* \#nodes|
-|\> 85 and <= 102|5 \* \#nodes|
-|\> 102 and <= 128|4 \* \#nodes|
-|\> 128 and <= 170|3 \* \#nodes|
-|\> 170 and <= 256|2 \* \#nodes|
-|\> 256 and <= 512|1 \* \#nodes|
-|\> 512|512|
-
-Reducing the value of `hawq_rm_nvseg_perquery_perseg_limit`can improve concurrency and increasing the value of `hawq_rm_nvseg_perquery_perseg_limit`could possibly increase the degree of parallelism. However, for some queries, increasing the degree of parallelism will not improve performance if the query has reached the limits set by the hardware. Therefore, increasing the value of `hawq_rm_nvseg_perquery_perseg_limit` above the default value is not recommended. Also, changing the value of `default_hash_table_bucket_number` after initializing a cluster means the hash table data must be redistributed. If you are expanding a cluster, you might wish to change this value, but be aware that retuning could adversely affect performance.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-tablespace.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-tablespace.html.md.erb b/ddl/ddl-tablespace.html.md.erb
deleted file mode 100644
index 8720665..0000000
--- a/ddl/ddl-tablespace.html.md.erb
+++ /dev/null
@@ -1,154 +0,0 @@
----
-title: Creating and Managing Tablespaces
----
-
-Tablespaces allow database administrators to have multiple file systems per machine and decide how to best use physical storage to store database objects. They are named locations within a filespace in which you can create objects. Tablespaces allow you to assign different storage for frequently and infrequently used database objects or to control the I/O performance on certain database objects. For example, place frequently-used tables on file systems that use high performance solid-state drives \(SSD\), and place other tables on standard hard drives.
-
-A tablespace requires a file system location to store its database files. In HAWQ, the master and each segment require a distinct storage location. The collection of file system locations for all components in a HAWQ system is a *filespace*. Filespaces can be used by one or more tablespaces.
-
-## <a id="topic10"></a>Creating a Filespace 
-
-A filespace sets aside storage for your HAWQ system. A filespace is a symbolic storage identifier that maps onto a set of locations in your HAWQ hosts' file systems. To create a filespace, prepare the logical file systems on all of your HAWQ hosts, then use the `hawq filespace` utility to define the filespace. You must be a database superuser to create a filespace.
-
-**Note:** HAWQ is not directly aware of the file system boundaries on your underlying systems. It stores files in the directories that you tell it to use. You cannot control the location on disk of individual files within a logical file system.
-
-### <a id="im178954"></a>To create a filespace using hawq filespace 
-
-1.  Log in to the HAWQ master as the `gpadmin` user.
-
-    ``` shell
-    $ su - gpadmin
-    ```
-
-2.  Create a filespace configuration file:
-
-    ``` shell
-    $ hawq filespace -o hawqfilespace_config
-    ```
-
-3.  At the prompt, enter a name for the filespace, a master file system location, and the primary segment file system locations. For example:
-
-    ``` shell
-    $ hawq filespace -o hawqfilespace_config
-    ```
-    ``` pre
-    Enter a name for this filespace
-    > testfs
-    Enter replica num for filespace. If 0, default replica num is used (default=3)
-    > 
-
-    Please specify the DFS location for the filespace (for example: localhost:9000/fs)
-    location> localhost:8020/fs        
-    20160409:16:53:25:028082 hawqfilespace:gpadmin:gpadmin-[INFO]:-[created]
-    20160409:16:53:25:028082 hawqfilespace:gpadmin:gpadmin-[INFO]:-
-    To add this filespace to the database please run the command:
-       hawqfilespace --config /Users/gpadmin/curwork/git/hawq/hawqfilespace_config
-    ```
-       
-    ``` shell
-    $ cat /Users/gpadmin/curwork/git/hawq/hawqfilespace_config
-    ```
-    ``` pre
-    filespace:testfs
-    fsreplica:3
-    dfs_url::localhost:8020/fs
-    ```
-    ``` shell
-    $ hawq filespace --config /Users/gpadmin/curwork/git/hawq/hawqfilespace_config
-    ```
-    ``` pre
-    Reading Configuration file: '/Users/gpadmin/curwork/git/hawq/hawqfilespace_config'
-
-    CREATE FILESPACE testfs ON hdfs 
-    ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 3);
-    20160409:16:57:56:028104 hawqfilespace:gpadmin:gpadmin-[INFO]:-Connecting to database
-    20160409:16:57:56:028104 hawqfilespace:gpadmin:gpadmin-[INFO]:-Filespace "testfs" successfully created
-
-    ```
-
-
-4.  `hawq filespace` creates a configuration file. Examine the file to verify that the hawq filespace configuration is correct. The following is a sample configuration file:
-
-    ```
-    filespace:fastdisk
-    mdw:1:/hawq_master_filespc/gp-1
-    sdw1:2:/hawq_pri_filespc/gp0
-    sdw2:3:/hawq_pri_filespc/gp1
-    ```
-
-5.  Run hawq filespace again to create the filespace based on the configuration file:
-
-    ``` shell
-    $ hawq filespace -c hawqfilespace_config
-    ```
-
-
-## <a id="topic13"></a>Creating a Tablespace 
-
-After you create a filespace, use the `CREATE TABLESPACE` command to define a tablespace that uses that filespace. For example:
-
-``` sql
-=# CREATE TABLESPACE fastspace FILESPACE fastdisk;
-```
-
-Database superusers define tablespaces and grant access to database users with the `GRANT``CREATE`command. For example:
-
-``` sql
-=# GRANT CREATE ON TABLESPACE fastspace TO admin;
-```
-
-## <a id="topic14"></a>Using a Tablespace to Store Database Objects 
-
-Users with the `CREATE` privilege on a tablespace can create database objects in that tablespace, such as tables, indexes, and databases. The command is:
-
-``` sql
-CREATE TABLE tablename(options) TABLESPACE spacename
-```
-
-For example, the following command creates a table in the tablespace *space1*:
-
-``` sql
-CREATE TABLE foo(i int) TABLESPACE space1;
-```
-
-You can also use the `default_tablespace` parameter to specify the default tablespace for `CREATE TABLE` and `CREATE INDEX` commands that do not specify a tablespace:
-
-``` sql
-SET default_tablespace = space1;
-CREATE TABLE foo(i int);
-```
-
-The tablespace associated with a database stores that database's system catalogs, temporary files created by server processes using that database, and is the default tablespace selected for tables and indexes created within the database, if no `TABLESPACE` is specified when the objects are created. If you do not specify a tablespace when you create a database, the database uses the same tablespace used by its template database.
-
-You can use a tablespace from any database if you have appropriate privileges.
-
-## <a id="topic15"></a>Viewing Existing Tablespaces and Filespaces 
-
-Every HAWQ system has the following default tablespaces.
-
--   `pg_global` for shared system catalogs.
--   `pg_default`, the default tablespace. Used by the *template1* and *template0* databases.
-
-These tablespaces use the system default filespace, `pg_system`, the data directory location created at system initialization.
-
-To see filespace information, look in the *pg\_filespace* and *pg\_filespace\_entry* catalog tables. You can join these tables with *pg\_tablespace* to see the full definition of a tablespace. For example:
-
-``` sql
-=# SELECT spcname AS tblspc, fsname AS filespc,
-          fsedbid AS seg_dbid, fselocation AS datadir
-   FROM   pg_tablespace pgts, pg_filespace pgfs,
-          pg_filespace_entry pgfse
-   WHERE  pgts.spcfsoid=pgfse.fsefsoid
-          AND pgfse.fsefsoid=pgfs.oid
-   ORDER BY tblspc, seg_dbid;
-```
-
-## <a id="topic16"></a>Dropping Tablespaces and Filespaces 
-
-To drop a tablespace, you must be the tablespace owner or a superuser. You cannot drop a tablespace until all objects in all databases using the tablespace are removed.
-
-Only a superuser can drop a filespace. A filespace cannot be dropped until all tablespaces using that filespace are removed.
-
-The `DROP TABLESPACE` command removes an empty tablespace.
-
-The `DROP FILESPACE` command removes an empty filespace.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl-view.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-view.html.md.erb b/ddl/ddl-view.html.md.erb
deleted file mode 100644
index 35da41e..0000000
--- a/ddl/ddl-view.html.md.erb
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: Creating and Managing Views
----
-
-Views enable you to save frequently used or complex queries, then access them in a `SELECT` statement as if they were a table. A view is not physically materialized on disk: the query runs as a subquery when you access the view.
-
-If a subquery is associated with a single query, consider using the `WITH` clause of the `SELECT` command instead of creating a seldom-used view.
-
-## <a id="topic101"></a>Creating Views 
-
-The `CREATE VIEW`command defines a view of a query. For example:
-
-``` sql
-CREATE VIEW comedies AS SELECT * FROM films WHERE kind = 'comedy';
-```
-
-Views ignore `ORDER BY` and `SORT` operations stored in the view.
-
-## <a id="topic102"></a>Dropping Views 
-
-The `DROP VIEW` command removes a view. For example:
-
-``` sql
-DROP VIEW topten;
-```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/ddl/ddl.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl.html.md.erb b/ddl/ddl.html.md.erb
deleted file mode 100644
index 7873fe7..0000000
--- a/ddl/ddl.html.md.erb
+++ /dev/null
@@ -1,19 +0,0 @@
----
-title: Defining Database Objects
----
-
-This section covers data definition language \(DDL\) in HAWQ and how to create and manage database objects.
-
-Creating objects in a HAWQ includes making up-front choices about data distribution, storage options, data loading, and other HAWQ features that will affect the ongoing performance of your database system. Understanding the options that are available and how the database will be used will help you make the right decisions.
-
-Most of the advanced HAWQ features are enabled with extensions to the SQL `CREATE` DDL statements.
-
-This section contains the topics:
-
-*  <a class="subnav" href="./ddl-database.html">Creating and Managing Databases</a>
-*  <a class="subnav" href="./ddl-tablespace.html">Creating and Managing Tablespaces</a>
-*  <a class="subnav" href="./ddl-schema.html">Creating and Managing Schemas</a>
-*  <a class="subnav" href="./ddl-table.html">Creating and Managing Tables</a>
-*  <a class="subnav" href="./ddl-storage.html">Table Storage Model and Distribution Policy</a>
-*  <a class="subnav" href="./ddl-partition.html">Partitioning Large Tables</a>
-*  <a class="subnav" href="./ddl-view.html">Creating and Managing Views</a>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/02-pipeline.png
----------------------------------------------------------------------
diff --git a/images/02-pipeline.png b/images/02-pipeline.png
deleted file mode 100644
index 26fec1b..0000000
Binary files a/images/02-pipeline.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/03-gpload-files.jpg
----------------------------------------------------------------------
diff --git a/images/03-gpload-files.jpg b/images/03-gpload-files.jpg
deleted file mode 100644
index d50435f..0000000
Binary files a/images/03-gpload-files.jpg and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/basic_query_flow.png
----------------------------------------------------------------------
diff --git a/images/basic_query_flow.png b/images/basic_query_flow.png
deleted file mode 100644
index 59172a2..0000000
Binary files a/images/basic_query_flow.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/ext-tables-xml.png
----------------------------------------------------------------------
diff --git a/images/ext-tables-xml.png b/images/ext-tables-xml.png
deleted file mode 100644
index f208828..0000000
Binary files a/images/ext-tables-xml.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/ext_tables.jpg
----------------------------------------------------------------------
diff --git a/images/ext_tables.jpg b/images/ext_tables.jpg
deleted file mode 100644
index d5a0940..0000000
Binary files a/images/ext_tables.jpg and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/ext_tables_multinic.jpg
----------------------------------------------------------------------
diff --git a/images/ext_tables_multinic.jpg b/images/ext_tables_multinic.jpg
deleted file mode 100644
index fcf09c4..0000000
Binary files a/images/ext_tables_multinic.jpg and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/gangs.jpg
----------------------------------------------------------------------
diff --git a/images/gangs.jpg b/images/gangs.jpg
deleted file mode 100644
index 0d14585..0000000
Binary files a/images/gangs.jpg and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/gporca.png
----------------------------------------------------------------------
diff --git a/images/gporca.png b/images/gporca.png
deleted file mode 100644
index 2909443..0000000
Binary files a/images/gporca.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/hawq_hcatalog.png
----------------------------------------------------------------------
diff --git a/images/hawq_hcatalog.png b/images/hawq_hcatalog.png
deleted file mode 100644
index 35b74c3..0000000
Binary files a/images/hawq_hcatalog.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/images/slice_plan.jpg
----------------------------------------------------------------------
diff --git a/images/slice_plan.jpg b/images/slice_plan.jpg
deleted file mode 100644
index ad8da83..0000000
Binary files a/images/slice_plan.jpg and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/install/aws-config.html.md.erb
----------------------------------------------------------------------
diff --git a/install/aws-config.html.md.erb b/install/aws-config.html.md.erb
deleted file mode 100644
index 21cadf5..0000000
--- a/install/aws-config.html.md.erb
+++ /dev/null
@@ -1,123 +0,0 @@
----
-title: Amazon EC2 Configuration
----
-
-Amazon Elastic Compute Cloud (EC2) is a service provided by Amazon Web Services (AWS).  You can install and configure HAWQ on virtual servers provided by Amazon EC2. The following information describes some considerations when deploying a HAWQ cluster in an Amazon EC2 environment.
-
-## <a id="topic_wqv_yfx_y5"></a>About Amazon EC2 
-
-Amazon EC2 can be used to launch as many virtual servers as you need, configure security and networking, and manage storage. An EC2 *instance* is a virtual server in the AWS cloud virtual computing environment.
-
-EC2 instances are managed by AWS. AWS isolates your EC2 instances from other users in a virtual private cloud (VPC) and lets you control access to the instances. You can configure instance features such as operating system, network connectivity (network ports and protocols, IP addresses), access to the Internet, and size and type of disk storage. 
-
-For information about Amazon EC2, see the [EC2 User Guide](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html).
-
-## <a id="topic_nhk_df4_2v"></a>Create and Launch HAWQ Instances
-
-Use the *Amazon EC2 Console* to launch instances and configure, start, stop, and terminate (delete) virtual servers. When you launch a HAWQ instance, you select and configure key attributes via the EC2 Console.
-
-
-### <a id="topic_amitype"></a>Choose AMI Type
-
-An Amazon Machine Image (AMI) is a template that contains a software configuration including the operating system, application server, and applications that best suit your purpose. When configuring a HAWQ virtual instance, we recommend you use a *hardware virtualized* AMI running 64-bit Red Hat Enterprise Linux version 6.4 or 6.5 or 64-bit CentOS 6.4 or 6.5.  Obtain the licenses and instances directly from the OS provider.
-
-### <a id="topic_selcfgstorage"></a>Consider Storage
-EC2 instances can be launched as either Elastic Block Store (EBS)-backed or instance store-backed.  
-
-Instance store-backed storage is generally better performing than EBS and recommended for HAWQ's large data workloads. SSD (solid state) instance store is preferred over magnetic drives.
-
-**Note** EC2 *instance store* provides temporary block-level storage. This storage is located on disks that are physically attached to the host computer. While instance store provides high performance, powering off the instance causes data loss. Soft reboots preserve instance store data. 
-     
-Virtual devices for instance store volumes for HAWQ EC2 instance store instances are named `ephemeralN` (where *N* varies based on instance type). CentOS instance store block device are named `/dev/xvdletter` (where *letter* is a lower case letter of the alphabet).
-
-### <a id="topic_cfgplacegrp"></a>Configure Placement Group 
-
-A placement group is a logical grouping of instances within a single availability zone that together participate in a low-latency, 10 Gbps network.  Your HAWQ master and segment cluster instances should support enhanced networking and reside in a single placement group (and subnet) for optimal network performance.  
-
-If your Ambari node is not a DataNode, locating the Ambari node instance in a subnet separate from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters from the single Ambari instance.
-
-Amazon recommends that you use the same instance type for all instances in the placement group and that you launch all instances within the placement group at the same time.
-
-Membership in a placement group has some implications on your HAWQ cluster.  Specifically, growing the cluster over capacity may require shutting down all HAWQ instances in the current placement group and restarting the instances to a new placement group. Instance store volumes are lost in this scenario.
-
-### <a id="topic_selinsttype"></a>Select EC2 Instance Type
-
-An EC2 instance type is a specific combination of CPU, memory, default storage, and networking capacity.  
-
-Several instance store-backed EC2 instance types have shown acceptable performance for HAWQ nodes in development and production environments: 
-
-| Instance Type  | Env | vCPUs | Memory (GB) | Disk Capacity (GB) | Storage Type |
-|-------|-----|------|--------|----------|--------|
-| cc2.8xlarge  | Dev | 32 | 60.5 | 4 x 840 | HDD |
-| d2.2xlarge  | Dev | 8 | 60 | 6 x 2000 | HDD |
-| d2.4xlarge  | Dev/QA | 16 | 122 | 12 x 2000 | HDD |
-| i2.8xlarge  | Prod | 32 | 244 | 8 x 800 | SSD |
-| hs1.8xlarge  | Prod | 16 | 117 | 24 x 2000 | HDD |
-| d2.8xlarge  | Prod | 36 | 244 | 24 x 2000 | HDD |
- 
-For optimal network performance, the chosen HAWQ instance type should support EC2 enhanced networking. Enhanced networking results in higher performance, lower latency, and lower jitter. Refer to [Enhanced Networking on Linux Instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) for detailed information on enabling enhanced networking in your instances.
-
-All instance types identified in the table above support enhanced networking.
-
-### <a id="topic_cfgnetw"></a>Configure Networking 
-
-Your HAWQ cluster instances should be in a single VPC and on the same subnet. Instances are always assigned a VPC internal IP address. This internal IP address should be used for HAWQ communication between hosts. You can also use the internal IP address to access an instance from another instance within the HAWQ VPC.
-
-You may choose to locate your Ambari node on a separate subnet in the VPC. Both a public IP address for the instance and an Internet gateway configured for the EC2 VPC are required to access the Ambari instance from an external source and for the instance to access the Internet. 
-
-Ensure your Ambari and HAWQ master instances are each assigned a public IP address for external and internet access. We recommend you also assign an Elastic IP Address to the HAWQ master instance.
-
-
-###Configure Security Groups<a id="topic_cfgsecgrp"></a>
-
-A security group is a set of rules that control network traffic to and from your HAWQ instance.  One or more rules may be associated with a security group, and one or more security groups may be associated with an instance.
-
-To configure HAWQ communication between nodes in the HAWQ cluster, include and open the following ports in the appropriate security group for the HAWQ master and segment nodes:
-
-| Port  | Application |
-|-------|-------------------------------------|
-| 22    | ssh - secure connect to other hosts |
-
-To allow access to/from a source external to the Ambari management node, include and open the following ports in an appropriate security group for your Ambari node:
-
-| Port  | Application |
-|-------|-------------------------------------|
-| 22    | ssh - secure connect to other hosts |
-| 8080  | Ambari - HAWQ admin/config web console |  
-
-
-###Generate Key Pair<a id="topic_cfgkeypair"></a>
-AWS uses public-key cryptography to secure the login information for your instance. You use the EC2 console to generate and name a key pair when you launch your instance.  
-
-A key pair for an EC2 instance consists of a *public key* that AWS stores, and a *private key file* that you maintain. Together, they allow you to connect to your instance securely. The private key file name typically has a `.pem` suffix.
-
-This example logs into an into EC2 instance from an external location with the private key file `my-test.pem` as user `user1`.  In this example, the instance is configured with the public IP address `192.0.2.0` and the private key file resides in the current directory.
-
-```shell
-$ ssh -i my-test.pem user1@192.0.2.0
-```
-
-##Additional HAWQ Considerations <a id="topic_mj4_524_2v"></a>
-
-After launching your HAWQ instance, you will connect to and configure the instance. The  *Instances* page of the EC2 Console lists the running instances and their associated network access information.
-
-Before installing HAWQ, set up the EC2 instances as you would local host server machines. Configure the host operating system, configure host network information (for example, update the `/etc/hosts` file), set operating system parameters, and install operating system packages. For information about how to prepare your operating system environment for HAWQ, see [Apache HAWQ System Requirements](../requirements/system-requirements.html) and [Select HAWQ Host Machines](../install/select-hosts.html).
-
-###Passwordless SSH Configuration<a id="topic_pwdlessssh_cc"></a>
-
-HAWQ hosts will be configured during the installation process to use passwordless SSH for intra-cluster communications. Temporary password-based authentication must be enabled on each HAWQ host in preparation for this configuration. Password authentication is typically disabled by default in cloud images. Update the cloud configuration in `/etc/cloud/cloud.cfg` to enable password authentication in your AMI(s). Set `ssh_pwauth: True` in this file. If desired, disable password authentication after HAWQ installation by setting the property back to `False`.
-  
-##References<a id="topic_hgz_zwy_bv"></a>
-
-Links to related Amazon Web Services and EC2 features and information.
-
-- [Amazon Web Services](https://aws.amazon.com)
-- [Amazon Machine Image \(AMI\)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html)
-- [EC2 Instance Store](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html)
-- [Elastic Block Store](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html)
-- [EC2 Key Pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
-- [Elastic IP Address](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html)
-- [Enhanced Networking on Linux Instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html)
-- [Internet Gateways] (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html)
-- [Subnet Public IP Addressing](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html#subnet-public-ip)
-- [Virtual Private Cloud](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/install/select-hosts.html.md.erb
----------------------------------------------------------------------
diff --git a/install/select-hosts.html.md.erb b/install/select-hosts.html.md.erb
deleted file mode 100644
index ecbe0b5..0000000
--- a/install/select-hosts.html.md.erb
+++ /dev/null
@@ -1,19 +0,0 @@
----
-title: Select HAWQ Host Machines
----
-
-Before you begin to install HAWQ, follow these steps to select and prepare the host machines.
-
-Complete this procedure for all HAWQ deployments:
-
-1.  **Choose the host machines that will host a HAWQ segment.** Keep in mind these restrictions and requirements:
-    -   Each host must meet the system requirements for the version of HAWQ you are installing.
-    -   Each HAWQ segment must be co-located on a host that runs an HDFS DataNode.
-    -   The HAWQ master segment and standby master segment must be hosted on separate machines.
-2.  **Choose the host machines that will run PXF.** Keep in mind these restrictions and requirements:
-    -   PXF must be installed on the HDFS NameNode *and* on all HDFS DataNodes.
-    -   If you have configured Hadoop with high availability, PXF must also be installed on all HDFS nodes including all NameNode services.
-    -   If you want to use PXF with HBase or Hive, you must first install the HBase client \(hbase-client\) and/or Hive client \(hive-client\) on each machine where you intend to install PXF. See the [HDP installation documentation](https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/index.html) for more information.
-3.  **Verify that required ports on all machines are unused.** By default, a HAWQ master or standby master service configuration uses port 5432. Hosts that run other PostgreSQL instances cannot be used to run a default HAWQ master or standby service configuration because the default PostgreSQL port \(5432\) conflicts with the default HAWQ port. You must either change the default port configuration of the running PostgreSQL instance or change the HAWQ master port setting during the HAWQ service installation to avoid port conflicts.
-    
-    **Note:** The Ambari server node uses PostgreSQL as the default metadata database. The Hive Metastore uses MySQL as the default metadata database.
\ No newline at end of file


Mime
View raw message