hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [02/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027
Date Mon, 29 Aug 2016 16:46:37 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SELECT.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SELECT.html.md.erb b/reference/sql/SELECT.html.md.erb
new file mode 100644
index 0000000..4649bad
--- /dev/null
+++ b/reference/sql/SELECT.html.md.erb
@@ -0,0 +1,507 @@
+---
+title: SELECT
+---
+
+Retrieves rows from a table or view.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SELECT [ALL | DISTINCT [ON (<expression> [, ...])]]
+  * | <expression> [[AS] <output_name>] [, ...]
+  [FROM <from_item> [, ...]]
+  [WHERE <condition>]
+  [GROUP BY <grouping_element> [, ...]]
+  [HAVING <condition> [, ...]]
+  [WINDOW <window_name> AS (<window_specification>)]
+  [{UNION | INTERSECT | EXCEPT} [ALL] <select>]
+  [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]]
+  [LIMIT {<count> | ALL}]
+  [OFFSET <start>]
+```
+
+where \<grouping\_element\> can be one of:
+
+``` pre
+  ()
+  <expression>
+  ROLLUP (<expression> [,...])
+  CUBE (<expression> [,...])
+  GROUPING SETS ((<grouping_element> [, ...]))
+```
+
+where \<window\_specification\> can be:
+
+``` pre
+  [<window_name>]
+  [PARTITION BY <expression> [, ...]]
+  [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+     [{RANGE | ROWS}
+          { UNBOUNDED PRECEDING
+          | <expression> PRECEDING
+          | CURRENT ROW
+          | BETWEEN <window_frame_bound> AND <window_frame_bound> }]]
+                    where <window_frame_bound> can be one of:
+                        UNBOUNDED PRECEDING
+                        <expression> PRECEDING
+                        CURRENT ROW
+                        <expression> FOLLOWING
+                        UNBOUNDED FOLLOWING
+```
+
+where \<from\_item\> can be one of:
+
+``` pre
+[ONLY] <table_name> [[AS] <alias> [( <column_alias> [, ...] )]]
+(select) [AS] <alias> [( <column_alias> [, ...] )]
+<function_name> ( [<argument> [, ...]] ) [AS] <alias>
+             [( <column_alias> [, ...]
+                | <column_definition> [, ...] )]
+<function_name> ( [<argument> [, ...]] ) AS
+              ( <column_definition> [, ...] )
+<from_item> [NATURAL] <join_type>
+            <from_item>
+          [ON <join_condition> | USING ( <join_column> [, ...] )]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`SELECT` retrieves rows from zero or more tables. The general processing of `SELECT` is as follows:
+
+1.  All elements in the `FROM` list are computed. (Each element in the `FROM` list is a real or virtual table.) If more than one element is specified in the `FROM` list, they are cross-joined together.
+2.  If the `WHERE` clause is specified, all rows that do not satisfy the condition are eliminated from the output.
+3.  If the `GROUP BY` clause is specified, the output is divided into groups of rows that match on one or more of the defined grouping elements. If the `HAVING` clause is present, it eliminates groups that do not satisfy the given condition.
+4.  If a window expression is specified (and optional `WINDOW` clause), the output is organized according to the positional (row) or value-based (range) window frame.
+5.  `DISTINCT` eliminates duplicate rows from the result. `DISTINCT ON` eliminates rows that match on all the specified expressions. `ALL` (the default) will return all candidate rows, including duplicates.
+6.  The actual output rows are computed using the `SELECT` output expressions for each selected row.
+7.  Using the operators `UNION`, `INTERSECT`, and `EXCEPT`, the output of more than one `SELECT` statement can be combined to form a single result set. The `UNION` operator returns all rows that are in one or both of the result sets. The `INTERSECT` operator returns all rows that are strictly in both result sets. The `EXCEPT` operator returns the rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated unless `ALL` is specified.
+8.  If the `ORDER BY` clause is specified, the returned rows are sorted in the specified order. If `ORDER BY` is not given, the rows are returned in whatever order the system finds fastest to produce.
+9.  If the `LIMIT` or `OFFSET` clause is specified, the `SELECT` statement only returns a subset of the result rows.
+
+You must have `SELECT` privilege on a table to read its values.
+
+## <a id="topic1__section4"></a>Parameters
+
+**The SELECT List**
+
+The `SELECT` list (between the key words `SELECT` and `FROM`) specifies expressions that form the output rows of the `SELECT` statement. The expressions can (and usually do) refer to columns computed in the `FROM` clause.
+
+Using the clause `[AS] ` \<output\_name\>, another name can be specified for an output column. This name is primarily used to label the column for display. It can also be used to refer to the column's value in `ORDER BY` and `GROUP BY` clauses, but not in the `WHERE` or `HAVING` clauses; there you must write out the expression instead. The `AS` keyword is optional in most cases (such as when declaring an alias for column names, constants, function calls, and simple unary operator expressions). In cases where the declared alias is a reserved SQL keyword, the \<output\_name\> must be enclosed in double quotes to avoid ambiguity.
+
+An \<expression\> in the `SELECT` list can be a constant value, a column reference, an operator invocation, a function call, an aggregate expression, a window expression, a scalar subquery, and so on. There are a number of constructs that can be classified as an expression but do not follow any general syntax rules.
+
+Instead of an expression, `*` can be written in the output list as a shorthand for all the columns of the selected rows. Also, you can write `                   table_name.*` as a shorthand for the columns coming from just that table.
+
+**The FROM Clause**
+
+The `FROM` clause specifies one or more source tables for the `SELECT`. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources. But usually qualification conditions are added to restrict the returned rows to a small subset of the Cartesian product. The `FROM` clause can contain the following elements:
+
+<dt> \<table\_name\>  </dt>
+<dd>The name (optionally schema-qualified) of an existing table or view. If `ONLY` is specified, only that table is scanned. If `ONLY` is not specified, the table and all its descendant tables (if any) are scanned.</dd>
+
+<dt> \<alias\>  </dt>
+<dd>A substitute name for the `FROM` item containing the alias. An alias is used for brevity or to eliminate ambiguity for self-joins (where the same table is scanned multiple times). When an alias is provided, it completely hides the actual name of the table or function; for example given `FROM foo AS f`, the remainder of the `SELECT` must refer to this `FROM` item as `f` not `foo`. If an alias is written, a column alias list can also be written to provide substitute names for one or more columns of the table.</dd>
+
+<dt> \<select\>  </dt>
+<dd>A sub-`SELECT` can appear in the `FROM` clause. This acts as though its output were created as a temporary table for the duration of this single `SELECT` command. Note that the sub-`SELECT` must be surrounded by parentheses, and an alias must be provided for it. A `VALUES` command can also be used here. See "Non-standard Clauses" in the [Compatibility](#topic1__section19) section for limitations of using correlated sub-selects in HAWQ.</dd>
+
+<dt> \<function\_name\>  </dt>
+<dd>Function calls can appear in the `FROM` clause. (This is especially useful for functions that return result sets, but any function can be used.) This acts as though its output were created as a temporary table for the duration of this single `SELECT` command. An alias may also be used. If an alias is written, a column alias list can also be written to provide substitute names for one or more attributes of the function's composite return type. If the function has been defined as returning the record data type, then an alias or the key word `AS` must be present, followed by a column definition list in the form `(<column_name> <data_type> [, ... ] )`. The column definition list must match the actual number and types of columns returned by the function.</dd>
+
+<dt> \<join\_type\>  </dt>
+<dd>One of:
+
+-   **\[INNER\] JOIN**
+-   **LEFT \[OUTER\] JOIN**
+-   **RIGHT \[OUTER\] JOIN**
+-   **FULL \[OUTER\] JOIN**
+-   **CROSS JOIN**
+
+For the `INNER` and `OUTER` join types, a join condition must be specified, namely exactly one of `NATURAL`, `ON <join_condition>                      `, or `USING (<join_column> [, ...])`. See below for the meaning. For `CROSS JOIN`, none of these clauses may appear.
+
+A `JOIN` clause combines two `FROM` items. Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, `JOIN`s nest left-to-right. In any case `JOIN` binds more tightly than the commas separating `FROM` items.
+
+`CROSS JOIN` and `INNER JOIN` produce a simple Cartesian product, the same result as you get from listing the two items at the top level of `FROM`, but restricted by the join condition (if any). `CROSS JOIN` is equivalent to `INNER JOIN                 ON` `(TRUE)`, that is, no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you could not do with plain `FROM` and `WHERE`.
+
+`LEFT OUTER JOIN` returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the `JOIN` clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards.
+
+Conversely, `RIGHT OUTER JOIN` returns all the joined rows, plus one row for each unmatched right-hand row (extended with nulls on the left). This is just a notational convenience, since you could convert it to a `LEFT OUTER                 JOIN` by switching the left and right inputs.
+
+`FULL OUTER JOIN` returns all the joined rows, plus one row for each unmatched left-hand row (extended with nulls on the right), plus one row for each unmatched right-hand row (extended with nulls on the left).</dd>
+
+<dt>ON \<join\_condition\>  </dt>
+<dd>\<join\_condition\> is an expression resulting in a value of type `boolean` (similar to a `WHERE` clause) that specifies which rows in a join are considered to match.</dd>
+
+<dt>USING (\<join\_column\> \[, ...\])  </dt>
+<dd>A clause of the form `USING ( a, b, ... )` is shorthand for `ON left_table.a = right_table.a AND left_table.b = right_table.b ...               `. Also, `USING` implies that only one of each pair of equivalent columns will be included in the join output, not both.</dd>
+
+<dt>NATURAL  </dt>
+<dd>`NATURAL` is shorthand for a `USING` list that mentions all columns in the two tables that have the same names.</dd>
+
+**The WHERE Clause**
+
+The optional `WHERE` clause has the general form:
+
+``` pre
+WHERE <condition>
+```
+
+where \<condition\> is any expression that evaluates to a result of type `boolean`. Any row that does not satisfy this condition will be eliminated from the output. A row satisfies the condition if it returns true when the actual row values are substituted for any variable references.
+
+**The GROUP BY Clause**
+
+The optional `GROUP BY` clause has the general form:
+
+``` pre
+GROUP BY <grouping_element> [, ...]
+```
+
+where \<grouping\_element\> can be one of:
+
+``` pre
+()
+<expression>
+ROLLUP (<expression> [,...])
+CUBE (<expression> [,...])
+GROUPING SETS ((<grouping_element> [, ...]))
+```
+
+`GROUP             BY` will condense into a single row all selected rows that share the same values for the grouped expressions. \<expression\> can be an input column name, or the name or ordinal number of an output column (`SELECT` list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a `GROUP BY` name will be interpreted as an input-column name rather than an output column name.
+
+Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without `GROUP BY`, an aggregate produces a single value computed across all the selected rows). When `GROUP BY` is present, it is not valid for the `SELECT` list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
+
+HAWQ has the following additional OLAP grouping extensions (often referred to as *supergroups*):
+
+<dt>ROLLUP  </dt>
+<dd>A `ROLLUP` grouping is an extension to the `GROUP BY` clause that creates aggregate subtotals that roll up from the most detailed level to a grand total, following a list of grouping columns (or expressions). `ROLLUP` takes an ordered list of grouping columns, calculates the standard aggregate values specified in the `GROUP BY` clause, then creates progressively higher-level subtotals, moving from right to left through the list. Finally, it creates a grand total. A `ROLLUP` grouping can be thought of as a series of grouping sets. For example:
+
+``` pre
+GROUP BY ROLLUP (a,b,c)
+```
+
+is equivalent to:
+
+``` pre
+GROUP BY GROUPING SETS( (a,b,c), (a,b), (a), () )
+```
+
+Notice that the *n* elements of a `ROLLUP` translate to *n*+1 grouping sets. Also, the order in which the grouping expressions are specified is significant in a `ROLLUP`.</dd>
+
+<dt>CUBE  </dt>
+<dd>A `CUBE` grouping is an extension to the `GROUP BY` clause that creates subtotals for all of the possible combinations of the given list of grouping columns (or expressions). In terms of multidimensional analysis, `CUBE` generates all the subtotals that could be calculated for a data cube with the specified dimensions. For example:
+
+``` pre
+GROUP BY CUBE (a,b,c)
+```
+
+is equivalent to:
+
+``` pre
+GROUP BY GROUPING SETS( (a,b,c), (a,b), (a,c), (b,c), (a),
+(b), (c), () )
+```
+
+Notice that *n* elements of a `CUBE` translate to 2n grouping sets. Consider using `CUBE` in any situation requiring cross-tabular reports. `CUBE` is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension. For instance, a commonly requested cross-tabulation might need subtotals for all the combinations of month, state, and product.</dd>
+
+<dt>GROUPING SETS  </dt>
+<dd>You can selectively specify the set of groups that you want to create using a `GROUPING SETS` expression within a `GROUP BY` clause. This allows precise specification across multiple dimensions without computing a whole `ROLLUP` or `CUBE`. For example:
+
+``` pre
+GROUP BY GROUPING SETS( (a,c), (a,b) )
+```
+
+If using the grouping extension clauses `ROLLUP`, `CUBE`, or `GROUPING SETS`, two challenges arise. First, how do you determine which result rows are subtotals, and then the exact level of aggregation for a given subtotal. Or, how do you differentiate between result rows that contain both stored `NULL` values and "NULL" values created by the `ROLLUP` or `CUBE`. Secondly, when duplicate grouping sets are specified in the `GROUP BY` clause, how do you determine which result rows are duplicates? There are two additional grouping functions you can use in the `SELECT` list to help with this:
+
+-   **grouping(\<column\> \[, ...\])** — The `grouping` function can be applied to one or more grouping attributes to distinguish super-aggregated rows from regular grouped rows. This can be helpful in distinguishing a "NULL" representing the set of all values in a super-aggregated row from a `NULL` value in a regular row. Each argument in this function produces a bit — either `1` or `0`, where `1` means the result row is super-aggregated, and `0` means the result row is from a regular grouping. The `grouping` function returns an integer by treating these bits as a binary number and then converting it to a base-10 integer.
+-   **group\_id()** — For grouping extension queries that contain duplicate grouping sets, the `group_id` function is used to identify duplicate rows in the output. All *unique* grouping set output rows will have a group\_id value of 0. For each duplicate grouping set detected, the `group_id` function assigns a group\_id number greater than 0. All output rows in a particular duplicate grouping set are identified by the same group\_id number.</dd>
+
+**The WINDOW Clause**
+
+The `WINDOW` clause is used to define a window that can be used in the `OVER()` expression of a window function such as `rank` or `avg`. For example:
+
+``` pre
+SELECT vendor, rank() OVER (mywindow) FROM sale
+GROUP BY vendor
+WINDOW mywindow AS (ORDER BY sum(prc*qty));
+```
+
+A `WINDOW` clause has this general form:
+
+``` pre
+WINDOW <window_name> AS (<window_specification>)
+```
+
+where \<window\_specification\> can be:
+
+``` pre
+[<window_name>]
+[PARTITION BY <expression> [, ...]]
+[ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+    [{RANGE | ROWS}
+      { UNBOUNDED PRECEDING
+      | <expression> PRECEDING
+      | CURRENT ROW
+      | BETWEEN <window_frame_bound> AND <window_frame_bound> }]]
+             where window_frame_bound can be one of:
+               UNBOUNDED PRECEDING
+               <expression> PRECEDING
+               CURRENT ROW
+               <expression> FOLLOWING
+               UNBOUNDED FOLLOWING
+```
+
+<dt> \<window\_name\>  </dt>
+<dd>Gives a name to the window specification.</dd>
+
+<dt>PARTITION BY  </dt>
+<dd>The `PARTITION BY` clause organizes the result set into logical groups based on the unique values of the specified expression. When used with window functions, the functions are applied to each partition independently. For example, if you follow `PARTITION BY` with a column name, the result set is partitioned by the distinct values of that column. If omitted, the entire result set is considered one partition.
+
+<dt>ORDER BY  </dt>
+<dd>The `ORDER BY` clause defines how to sort the rows in each partition of the result set. If omitted, rows are returned in whatever order is most efficient and may vary.
+
+**Note:** Columns of data types that lack a coherent ordering, such as `time`, are not good candidates for use in the `ORDER                   BY` clause of a window specification. Time, with or without time zone, lacks a coherent ordering because addition and subtraction do not have the expected effects. For example, the following is not generally true: `x::time <                   x::time + '2 hour'::interval`</dd>
+
+<dt>ROWS | RANGE  </dt>
+<dd>Use either a `ROWS` or `RANGE` clause to express the bounds of the window. The window bound can be one, many, or all rows of a partition. You can express the bound of the window either in terms of a range of data values offset from the value in the current row (`RANGE`), or in terms of the number of rows offset from the currentrow (`ROWS`). When using the `RANGE` clause, you must also use an `ORDER BY` clause. This is because the calculation performed to produce the window requires that the values be sorted. Additionally, the `ORDER BY` clause cannot contain more than one expression, and the expression must result in either a date or a numeric value. When using the `ROWS` or `RANGE` clauses, if you specify only a starting row, the current row is used as the last row in the window.
+
+**PRECEDING** — The `PRECEDING` clause defines the first row of the window using the current row as a reference point. The starting row is expressed in terms of the number of rows preceding the current row. For example, in the case of `ROWS` framing, 5 `PRECEDING` sets the window to start with the fifth row preceding the current row. In the case of `RANGE` framing, it sets the window to start with the first row whose ordering column value precedes that of the current row by 5 in the given order. If the specified order is ascending by date, this will be the first row within 5 days before the current row. `UNBOUNDED PRECEDING` sets the first row in the window to be the first row in the partition.
+
+**BETWEEN** — The `BETWEEN` clause defines the first and last row of the window, using the current row as a reference point. First and last rows are expressed in terms of the number of rows preceding and following the current row, respectively. For example, `BETWEEN 3 PRECEDING AND 5 FOLLOWING` sets the window to start with the third row preceding the current row, and end with the fifth row following the current row. Use `BETWEEN UNBOUNDED PRECEDING AND                 UNBOUNDED FOLLOWING` to set the first and last rows in the window to be the first and last row in the partition, respectively. This is equivalent to the default behavior if no `ROW` or `RANGE` clause is specified.
+
+**FOLLOWING** — The `FOLLOWING` clause defines the last row of the window using the current row as a reference point. The last row is expressed in terms of the number of rows following the current row. For example, in the case of `ROWS` framing, `5 FOLLOWING` sets the window to end with the fifth row following the current row. In the case of `RANGE` framing, it sets the window to end with the last row whose ordering column value follows that of the current row by 5 in the given order. If the specified order is ascending by date, this will be the last row within 5 days after the current row. Use `UNBOUNDED FOLLOWING` to set the last row in the window to be the last row in the partition.
+
+If you do not specify a `ROW` or a `RANGE` clause, the window bound starts with the first row in the partition (`UNBOUNDED                 PRECEDING`) and ends with the current row (`CURRENT ROW`) if `ORDER BY` is used. If an `ORDER BY` is not specified, the window starts with the first row in the partition (`UNBOUNDED                 PRECEDING`) and ends with last row in the partition (`UNBOUNDED                 FOLLOWING`).</dd>
+
+**The HAVING Clause**
+
+The optional `HAVING` clause has the general form:
+
+``` pre
+HAVING <condition>
+```
+
+where \<condition\> is the same as specified for the `WHERE` clause. `HAVING` eliminates group rows that do not satisfy the condition. `HAVING` is different from `WHERE`: `WHERE` filters individual rows before the application of `GROUP BY`, while `HAVING` filters group rows created by `GROUP BY`. Each column referenced in \<condition\> must unambiguously reference a grouping column, unless the reference appears within an aggregate function.
+
+The presence of `HAVING` turns a query into a grouped query even if there is no `GROUP BY` clause. This is the same as what happens when the query contains aggregate functions but no `GROUP BY` clause. All the selected rows are considered to form a single group, and the `SELECT` list and `HAVING` clause can only reference table columns from within aggregate functions. Such a query will emit a single row if the `HAVING` condition is true, zero rows if it is not true.
+
+**The UNION Clause**
+
+The `UNION` clause has this general form:
+
+``` pre
+<select_statement> UNION [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any `SELECT` statement without an `ORDER BY`, `LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause. (`ORDER BY` and `LIMIT` can be attached to a subquery expression if it is enclosed in parentheses. Without parentheses, these clauses will be taken to apply to the result of the `UNION`, not to its right-hand input expression.)
+
+The `UNION` operator computes the set union of the rows returned by the involved `SELECT` statements. A row is in the set union of two result sets if it appears in at least one of the result sets. The two `SELECT` statements that represent the direct operands of the `UNION` must produce the same number of columns, and corresponding columns must be of compatible data types.
+
+The result of `UNION` does not contain any duplicate rows unless the `ALL` option is specified. `ALL` prevents elimination of duplicates. (Therefore, `UNION ALL` is usually significantly quicker than `UNION`; use `ALL` when you can.)
+
+Multiple `UNION` operators in the same `SELECT` statement are evaluated left to right, unless otherwise indicated by parentheses.
+
+Currently, `FOR UPDATE` and `FOR SHARE` may not be specified either for a `UNION` result or for any input of a `UNION`.
+
+**The INTERSECT Clause**
+
+The `INTERSECT` clause has this general form:
+
+``` pre
+<select_statement> INTERSECT [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any SELECT statement without an `ORDER BY`, `LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause.
+
+The `INTERSECT` operator computes the set intersection of the rows returned by the involved `SELECT` statements. A row is in the intersection of two result sets if it appears in both result sets.
+
+The result of `INTERSECT` does not contain any duplicate rows unless the `ALL` option is specified. With `ALL`, a row that has *m* duplicates in the left table and *n* duplicates in the right table will appear min(*m*, *n*) times in the result set.
+
+Multiple `INTERSECT` operators in the same `SELECT` statement are evaluated left to right, unless parentheses dictate otherwise. `INTERSECT` binds more tightly than `UNION`. That is, `A UNION B INTERSECT C` will be read as `A UNION (B INTERSECT C)`.
+
+Currently, `FOR UPDATE` and `FOR SHARE` may not be specified either for an `INTERSECT` result or for any input of an `INTERSECT`.
+
+**The EXCEPT Clause**
+
+The `EXCEPT` clause has this general form:
+
+``` pre
+<select_statement> EXCEPT [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any `SELECT` statement without an `ORDER BY`, `LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause.
+
+The `EXCEPT` operator computes the set of rows that are in the result of the left `SELECT` statement but not in the result of the right one.
+
+The result of `EXCEPT` does not contain any duplicate rows unless the `ALL` option is specified. With `ALL`, a row that has *m* duplicates in the left table and *n* duplicates in the right table will appear max(*m-n*,0) times in the result set.
+
+Multiple `EXCEPT` operators in the same `SELECT` statement are evaluated left to right unless parentheses dictate otherwise. `EXCEPT` binds at the same level as `UNION`.
+
+Currently, `FOR             UPDATE` and `FOR SHARE` may not be specified either for an `EXCEPT` result or for any input of an `EXCEPT`.
+
+**The ORDER BY Clause**
+
+The optional `ORDER BY` clause has this general form:
+
+``` pre
+ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+```
+
+where \<expression\> can be the name or ordinal number of an output column (`SELECT` list item), or it can be an arbitrary expression formed from input-column values.
+
+The `ORDER BY` clause causes the result rows to be sorted according to the specified expressions. If two rows are equal according to the left-most expression, they are compared according to the next expression and so on. If they are equal according to all specified expressions, they are returned in an implementation-dependent order.
+
+The ordinal number refers to the ordinal (left-to-right) position of the result column. This feature makes it possible to define an ordering on the basis of a column that does not have a unique name. This is never absolutely necessary because it is always possible to assign a name to a result column using the `AS` clause.
+
+It is also possible to use arbitrary expressions in the `ORDER BY` clause, including columns that do not appear in the `SELECT` result list. Thus the following statement is valid:
+
+``` pre
+SELECT name FROM distributors ORDER BY code;
+```
+
+A limitation of this feature is that an `ORDER BY` clause applying to the result of a `UNION`, `INTERSECT`, or `EXCEPT` clause may only specify an output column name or number, not an expression.
+
+If an `ORDER BY` expression is a simple name that matches both a result column name and an input column name, `ORDER BY` will interpret it as the result column name. This is the opposite of the choice that `GROUP BY` will make in the same situation. This inconsistency is made to be compatible with the SQL standard.
+
+Optionally one may add the key word `ASC` (ascending) or `DESC` (descending) after any expression in the `ORDER BY` clause. If not specified, `ASC` is assumed by default. Alternatively, a specific ordering operator name may be specified in the `USING` clause. `ASC` is usually equivalent to `USING <` and `DESC` is usually equivalent to `USING >`. (But the creator of a user-defined data type can define exactly what the default sort ordering is, and it might correspond to operators with other names.)
+
+The null value sorts higher than any other value. In other words, with ascending sort order, null values sort at the end, and with descending sort order, null values sort at the beginning.
+
+Character-string data is sorted according to the locale-specific collation order that was established when the HAWQ system was initialized.
+
+**The DISTINCT Clause**
+
+If `DISTINCT` is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates). `ALL` specifies the opposite: all rows are kept. `ALL` is the default.
+
+`DISTINCT ON ( <expression> [, ...] )` keeps only the first row of each set of rows where the given expressions evaluate to equal. The `DISTINCT ON` expressions are interpreted using the same rules as for `ORDER BY`. Note that the 'first row' of each set is unpredictable unless `ORDER BY` is used to ensure that the desired row appears first. For example:
+
+``` pre
+SELECT DISTINCT ON (location) location, time, report FROM
+weather_reports ORDER BY location, time DESC;
+```
+
+retrieves the most recent weather report for each location. But if we had not used `ORDER             BY` to force descending order of time values for each location, we would have gotten a report from an unpredictable time for each location.
+
+The `DISTINCT ON` expression(s) must match the left-most `ORDER BY` expression(s). The `ORDER BY` clause will normally contain additional expression(s) that determine the desired precedence of rows within each `DISTINCT             ON` group.
+
+**The LIMIT Clause**
+
+The `LIMIT` clause consists of two independent sub-clauses:
+
+``` pre
+LIMIT {<count> | ALL}
+OFFSET <start>
+
+```
+
+where \<count\> specifies the maximum number of rows to return, while \<start\> specifies the number of rows to skip before starting to return rows. When both are specified, start rows are skipped before starting to count the count rows to be returned.
+
+When using `LIMIT`, it is a good idea to use an `ORDER BY` clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows. You may be asking for the tenth through twentieth rows, but tenth through twentieth in what ordering? You don't know what ordering unless you specify `ORDER BY`.
+
+The query planner takes `LIMIT` into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for `LIMIT` and `OFFSET`. Thus, using different `LIMIT/OFFSET` values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with `ORDER BY`. This is not a defect; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless `ORDER BY` is used to constrain the order.
+
+## <a id="topic1__section18"></a>Examples
+
+To join the table `films` with the table `distributors`:
+
+``` sql
+SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM
+distributors d, films f WHERE f.did = d.did
+```
+
+To sum the column `length` of all films and group the results by `kind`:
+
+``` sql
+SELECT kind, sum(length) AS total FROM films GROUP BY kind;
+```
+
+To sum the column `length` of all films, group the results by `kind` and show those group totals that are less than 5 hours:
+
+``` sql
+SELECT kind, sum(length) AS total FROM films GROUP BY kind
+HAVING sum(length) < interval '5 hours';
+```
+
+Calculate the subtotals and grand totals of all sales for movie `kind` and `distributor`.
+
+``` sql
+SELECT kind, distributor, sum(prc*qty) FROM sales
+GROUP BY ROLLUP(kind, distributor)
+ORDER BY 1,2,3;
+```
+
+Calculate the rank of movie distributors based on total sales:
+
+``` sql
+SELECT distributor, sum(prc*qty),
+       rank() OVER (ORDER BY sum(prc*qty) DESC)
+FROM sale
+GROUP BY distributor ORDER BY 2 DESC;
+```
+
+The following two examples are identical ways of sorting the individual results according to the contents of the second column (`name`):
+
+``` sql
+SELECT * FROM distributors ORDER BY name;
+SELECT * FROM distributors ORDER BY 2;
+```
+
+The next example shows how to obtain the union of the tables `distributors` and `actors`, restricting the results to those that begin with the letter `W` in each table. Only distinct rows are wanted, so the key word `ALL` is omitted:
+
+``` sql
+SELECT distributors.name FROM distributors WHERE
+distributors.name LIKE 'W%' UNION SELECT actors.name FROM
+actors WHERE actors.name LIKE 'W%';
+```
+
+This example shows how to use a function in the `FROM` clause, both with and without a column definition list:
+
+``` pre
+CREATE FUNCTION distributors(int) RETURNS SETOF distributors
+AS $$ SELECT * FROM distributors WHERE did = $1; $$ LANGUAGE
+SQL;
+SELECT * FROM distributors(111);
+
+CREATE FUNCTION distributors_2(int) RETURNS SETOF record AS
+$$ SELECT * FROM distributors WHERE did = $1; $$ LANGUAGE
+SQL;
+SELECT * FROM distributors_2(111) AS (dist_id int, dist_name
+text);
+```
+
+## <a id="topic1__section19"></a>Compatibility
+
+The `SELECT` statement is compatible with the SQL standard, but there are some extensions and some missing features.
+
+**Omitted FROM Clauses**
+
+HAWQ allows you to omit the `FROM` clause. It has a straightforward use to compute the results of simple expressions. For example:
+
+``` sql
+SELECT 2+2;
+```
+
+Some other SQL databases cannot do this except by introducing a dummy one-row table from which to do the `SELECT`.
+
+Note that if a `FROM` clause is not specified, the query cannot reference any database tables. For compatibility with applications that rely on this behavior the *add\_missing\_from* configuration parameter can be enabled.
+
+**The AS Key Word**
+
+In the SQL standard, the optional key word `AS` is just noise and can be omitted without affecting the meaning. The HAWQ parser requires this key word when renaming output columns because the type extensibility features lead to parsing ambiguities without it. `AS` is optional in `FROM` items, however.
+
+**Namespace Available to GROUP BY and ORDER BY**
+
+In the SQL-92 standard, an `ORDER BY` clause may only use result column names or numbers, while a `GROUP BY` clause may only use expressions based on input column names. HAWQ extends each of these clauses to allow the other choice as well (but it uses the standard's interpretation if there is ambiguity). HAWQ also allows both clauses to specify arbitrary expressions. Note that names appearing in an expression will always be taken as input-column names, not as result-column names.
+
+SQL:1999 and later use a slightly different definition which is not entirely upward compatible with SQL-92. In most cases, however, HAWQ will interpret an `ORDER BY` or `GROUP             BY` expression the same way SQL:1999 does.
+
+**Nonstandard Clauses**
+
+The clauses `DISTINCT ON`, `LIMIT`, and `OFFSET` are not defined in the SQL standard.
+
+**Limited Use of STABLE and VOLATILE Functions**
+
+To prevent data from becoming out-of-sync across the segments in HAWQ, any function classified as `STABLE` or `VOLATILE` cannot be executed at the segment database level if it contains SQL or modifies the database in any way.
+
+## <a id="topic1__section25"></a>See Also
+
+[EXPLAIN](EXPLAIN.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET-ROLE.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET-ROLE.html.md.erb b/reference/sql/SET-ROLE.html.md.erb
new file mode 100644
index 0000000..63a03f6
--- /dev/null
+++ b/reference/sql/SET-ROLE.html.md.erb
@@ -0,0 +1,72 @@
+---
+title: SET ROLE
+---
+
+Sets the current role identifier of the current session.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] ROLE <rolename>
+SET [SESSION | LOCAL] ROLE NONE
+RESET ROLE
+```
+
+## <a id="topic1__section3"></a>Description
+
+This command sets the current role identifier of the current SQL-session context to be \<rolename\>. The role name may be written as either an identifier or a string literal. After `SET ROLE`, permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally.
+
+The specified \<rolename\> must be a role that the current session user is a member of. If the session user is a superuser, any role can be selected.
+
+The `NONE` and `RESET` forms reset the current role identifier to be the current session role identifier. These forms may be executed by any user.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. Note that `SET LOCAL` will appear to have no effect if it is executed outside of a transaction.</dd>
+
+<dt> \<rolename\>   </dt>
+<dd>The name of a role to use for permissions checking in this session.</dd>
+
+<dt>NONE  
+RESET  </dt>
+<dd>Reset the current role identifier to be the current session role identifier (that of the role used to log in).</dd>
+
+## <a id="topic1__section5"></a>Notes
+
+Using this command, it is possible to either add privileges or restrict privileges. If the session user role has the `INHERITS` attribute, then it automatically has all the privileges of every role that it could `SET ROLE` to; in this case `SET ROLE` effectively drops all the privileges assigned directly to the session user and to the other roles it is a member of, leaving only the privileges available to the named role. On the other hand, if the session user role has the `NOINHERITS` attribute, `SET ROLE` drops the privileges assigned directly to the session user and instead acquires the privileges available to the named role.
+
+In particular, when a superuser chooses to `SET ROLE` to a non-superuser role, she loses her superuser privileges.
+
+`SET ROLE` has effects comparable to `SET SESSION AUTHORIZATION`, but the privilege checks involved are quite different. Also, `SET SESSION AUTHORIZATION` determines which roles are allowable for later `SET ROLE` commands, whereas changing roles with `SET ROLE` does not change the set of roles allowed to a later `SET ROLE`.
+
+## <a id="topic1__section6"></a>Examples
+
+``` sql
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | peter
+```
+``` sql
+SET ROLE 'paul';
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | paul
+```
+
+## <a id="topic1__section7"></a>Compatibility
+
+HAWQ allows identifier syntax (\<rolename\>), while the SQL standard requires the role name to be written as a string literal. SQL does not allow this command during a transaction; HAWQ does not make this restriction. The `SESSION` and `LOCAL` modifiers are a HAWQ extension, as is the `RESET` syntax.
+
+## <a id="topic1__section8"></a>See Also
+
+[SET SESSION AUTHORIZATION](SET-SESSION-AUTHORIZATION.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb b/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
new file mode 100644
index 0000000..adea314
--- /dev/null
+++ b/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
@@ -0,0 +1,66 @@
+---
+title: SET SESSION AUTHORIZATION
+---
+
+Sets the session role identifier and the current role identifier of the current session.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] SESSION AUTHORIZATION <rolename>
+SET [SESSION | LOCAL] SESSION AUTHORIZATION DEFAULT
+RESET SESSION AUTHORIZATION
+```
+
+## <a id="topic1__section3"></a>Description
+
+This command sets the session role identifier and the current role identifier of the current SQL-session context to \<rolename\> . The role name may be written as either an identifier or a string literal. Using this command, it is possible, for example, to temporarily become an unprivileged user and later switch back to being a superuser.
+
+The session role identifier is initially set to be the (possibly authenticated) role name provided by the client. The current role identifier is normally equal to the session user identifier, but may change temporarily in the context of setuid functions and similar mechanisms; it can also be changed by [SET ROLE](SET-ROLE.html). The current user identifier is relevant for permission checking.
+
+The session user identifier may be changed only if the initial session user (the authenticated user) had the superuser privilege. Otherwise, the command is accepted only if it specifies the authenticated user name.
+
+The `DEFAULT` and `RESET` forms reset the session and current user identifiers to be the originally authenticated user name. These forms may be executed by any user.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. Note that `SET LOCAL` will appear to have no effect if it is executed outside of a transaction.</dd>
+
+<dt> \<rolename\>   </dt>
+<dd>The name of the role to assume.</dd>
+
+<dt>NONE  
+RESET  </dt>
+<dd>Reset the session and current role identifiers to be that of the role used to log in.</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+``` sql
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | peter
+```
+``` sql
+SET SESSION AUTHORIZATION 'paul';
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ paul         | paul
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+The SQL standard allows some other expressions to appear in place of the literal \<rolename\>, but these options are not important in practice. HAWQ allows identifier syntax (\<rolename\>), while SQL does not. SQL does not allow this command during a transaction; HAWQ does not make this restriction. The `SESSION` and `LOCAL` modifiers are a HAWQ extension, as is the `RESET` syntax.
+
+## <a id="topic1__section7"></a>See Also
+
+[SET ROLE](SET-ROLE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET.html.md.erb b/reference/sql/SET.html.md.erb
new file mode 100644
index 0000000..4f4ad24
--- /dev/null
+++ b/reference/sql/SET.html.md.erb
@@ -0,0 +1,87 @@
+---
+title: SET
+---
+
+Changes the value of a HAWQ configuration parameter.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] <configuration_parameter> {TO | =} {<value> | '<value>' | DEFAULT}
+SET [SESSION | LOCAL] TIME ZONE {<timezone> | LOCAL | DEFAULT}
+```
+
+## <a id="topic1__section3"></a>Description
+
+The `SET` command changes server configuration parameters. Any configuration parameter classified as a *session* parameter can be changed on-the-fly with `SET`. See [About Server Configuration Parameters](../guc/guc_config.html#topic1). `SET` only affects the value used by the current session.
+
+If `SET` or `SET SESSION` is issued within a transaction that is later aborted, the effects of the `SET` command disappear when the transaction is rolled back. Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another `SET`.
+
+The effects of `SET LOCAL` only last till the end of the current transaction, whether committed or not. A special case is `SET` followed by `SET LOCAL` within a single transaction: the `SET                LOCAL` value will be seen until the end of the transaction, but afterwards (if the transaction is committed) the `SET` value will take effect.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. Note that `SET LOCAL` will appear to have no effect if it is executed outside of a transaction.</dd>
+
+<dt> \<configuration\_parameter\>  </dt>
+<dd>The name of a HAWQ configuration parameter. Only parameters classified as *session* can be changed with `SET`. See [About Server Configuration Parameters](../guc/guc_config.html#topic1).</dd>
+
+<dt> \<value\>  </dt>
+<dd>New value of parameter. Values can be specified as string constants, identifiers, numbers, or comma-separated lists of these. `DEFAULT` can be used to specify resetting the parameter to its default value. If specifying memory sizing or time units, enclose the value in single quotes.</dd>
+
+<dt>TIME ZONE  </dt>
+<dd>`SET TIME ZONE` value is an alias for `SET timezone TO                         value`.
+
+<dt>LOCAL,  
+DEFAULT  </dt>
+<dd>Set the time zone to your local time zone (the one that the server's operating system defaults to).</dd>
+
+<dt> \<timezone\>  </dt>
+<dd>The \<timezone\> specification. Examples of syntactically valid values:
+
+`'PST8PDT'`
+
+`'Europe/Rome'`
+
+`-7` (time zone 7 hours west from UTC)
+
+`INTERVAL '-08:00' HOUR TO MINUTE` (time zone 8 hours west from UTC).</dd>
+</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+Set the schema search path:
+
+``` sql
+SET search_path TO my_schema, public;
+```
+
+Set the style of date to traditional POSTGRES with "day before month" input convention:
+
+``` sql
+SET datestyle TO postgres, dmy;
+```
+
+Set the time zone for San Mateo, California (Pacific Time):
+
+``` sql
+SET TIME ZONE 'PST8PDT';
+```
+
+Set the time zone for Italy:
+
+``` sql
+SET TIME ZONE 'Europe/Rome';
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+`SET TIME ZONE` extends the syntax defined in the SQL standard. The standard allows only numeric time zone offsets while HAWQ allows more flexible time-zone specifications. All other `SET` features are HAWQ extensions.
+
+## <a id="topic1__section7"></a>See Also
+
+[RESET](RESET.html), [SHOW](SHOW.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SHOW.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SHOW.html.md.erb b/reference/sql/SHOW.html.md.erb
new file mode 100644
index 0000000..802761b
--- /dev/null
+++ b/reference/sql/SHOW.html.md.erb
@@ -0,0 +1,47 @@
+---
+title: SHOW
+---
+
+Shows the value of a system configuration parameter.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SHOW <configuration_parameter>
+
+SHOW ALL
+```
+
+## <a id="topic1__section3"></a>Description
+
+`SHOW` displays the current settings of HAWQ system configuration parameters. These parameters can be set using the `SET` statement, or by editing the `hawq-site.xml` configuration file of the HAWQ master. Note that some parameters viewable by `SHOW` are read-only — their values can be viewed but not set. See [About Server Configuration Parameters](../guc/guc_config.html#topic1).
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt> \<configuration\_parameter\>   </dt>
+<dd>The name of a system configuration parameter.</dd>
+
+<dt>ALL  </dt>
+<dd>Shows the current value of all configuration parameters.</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+Show the current setting of the parameter `search_path`:
+
+``` sql
+SHOW search_path;
+```
+
+Show the current setting of all parameters:
+
+``` sql
+SHOW ALL;
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+`SHOW` is a HAWQ extension.
+
+## <a id="topic1__section7"></a>See Also
+
+[SET](SET.html), [RESET](RESET.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/TRUNCATE.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/TRUNCATE.html.md.erb b/reference/sql/TRUNCATE.html.md.erb
new file mode 100644
index 0000000..c91ae84
--- /dev/null
+++ b/reference/sql/TRUNCATE.html.md.erb
@@ -0,0 +1,52 @@
+---
+title: TRUNCATE
+---
+
+Empties a table of all rows.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+TRUNCATE [TABLE] <name> [, ...] [CASCADE | RESTRICT]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`TRUNCATE` quickly removes all rows from a table or set of tables.This is most useful on large tables.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt> \<name\>   </dt>
+<dd>Required. The name (optionally schema-qualified) of a table to be truncated.</dd>
+
+<dt>CASCADE  </dt>
+<dd>Since this key word applies to foreign key references (which are not supported in HAWQ) it has no effect.</dd>
+
+<dt>RESTRICT  </dt>
+<dd>Since this key word applies to foreign key references (which are not supported in HAWQ) it has no effect.</dd>
+
+## <a id="topic1__section5"></a>Notes
+
+Only the owner of a table may `TRUNCATE` it. `TRUNCATE` will not perform the following:
+
+-   Run any user-defined `ON DELETE` triggers that might exist for the tables.
+
+    **Note:** HAWQ does not support user-defined triggers.
+
+-   Truncate any tables that inherit from the named table. Only the named table is truncated, not its child tables.
+
+## <a id="topic1__section6"></a>Examples
+
+Empty the table `films`:
+
+``` sql
+TRUNCATE films;
+```
+
+## <a id="topic1__section7"></a>Compatibility
+
+There is no `TRUNCATE` command in the SQL standard.
+
+## <a id="topic1__section8"></a>See Also
+
+[DROP TABLE](DROP-TABLE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/VACUUM.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/VACUUM.html.md.erb b/reference/sql/VACUUM.html.md.erb
new file mode 100644
index 0000000..2db5757
--- /dev/null
+++ b/reference/sql/VACUUM.html.md.erb
@@ -0,0 +1,92 @@
+---
+title: VACUUM
+---
+
+Garbage-collects and optionally analyzes a database.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+VACUUM [FULL] [FREEZE] [VERBOSE] <table>
+VACUUM [FULL] [FREEZE] [VERBOSE] ANALYZE
+              [<table> [(<column> [, ...] )]]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`VACUUM` reclaims storage occupied by deleted tuples. In normal HAWQ operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present on disk until a `VACUUM` is done. Therefore it is necessary to do `VACUUM` periodically, especially on frequently-updated catalog tables. `VACUUM` has no effect on a normal HAWQ table, since the delete or update operations are not supported on normal HAWQ table.
+
+With no parameter, `VACUUM` processes every table in the current database. With a parameter, `VACUUM` processes only that table. `VACUUM ANALYZE` performs a `VACUUM` and then an `ANALYZE` for each selected table. This is a handy combination form for routine maintenance scripts. See [ANALYZE](ANALYZE.html) for more details about its processing.
+
+Plain `VACUUM` (without `FULL`) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. `VACUUM FULL` does more extensive processing, including moving of tuples across blocks to try to compact the table to the minimum number of disk blocks. This form is much slower and requires an exclusive lock on each table while it is being processed.
+
+**Outputs**
+
+When `VERBOSE` is specified, `VACUUM` emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well.
+
+## <a id="topic1__section5"></a>Parameters
+
+<dt>FULL  </dt>
+<dd>Selects a full vacuum, which may reclaim more space but takes much longer and exclusively locks the table.
+
+**Note:** A VACUUM FULL is not recommended in HAWQ. See [Notes](#topic1__section6).</dd>
+
+<dt>FREEZE  </dt>
+<dd>Specifying `FREEZE` is equivalent to performing `VACUUM` with the `vacuum_freeze_min_age` server configuration parameter set to zero. The `FREEZE` option is deprecated and will be removed in a future release. Set the parameter by using `hawq config` or by modifying your `hawq-site.xml` configuration file instead.</dd>
+
+<dt>VERBOSE  </dt>
+<dd>Prints a detailed vacuum activity report for each table.</dd>
+
+<dt>ANALYZE  </dt>
+<dd>Updates statistics used by the planner to determine the most efficient way to execute a query.</dd>
+
+<dt> \<table\>   </dt>
+<dd>The name (optionally schema-qualified) of a specific table to vacuum. Defaults to all tables in the current database.</dd>
+
+<dt> \<column\>   </dt>
+<dd>The name of a specific column to analyze. Defaults to all columns.</dd>
+
+## <a id="topic1__section6"></a>Notes
+
+`VACUUM` cannot be executed inside a transaction block.
+
+A recommended practice is to vacuum active production databases frequently (at least nightly), in order to remove expired rows. After adding or deleting a large number of rows, it may be a good idea to issue a `VACUUM ANALYZE` command for the affected table. This will update the system catalogs with the results of all recent changes, and allow the HAWQ query planner to make better choices in planning queries.
+
+`VACUUM` causes a substantial increase in I/O traffic, which can cause poor performance for other active sessions. Therefore, it is advisable to vacuum the database at low usage times. The `auto vacuum` daemon feature, that automates the execution of `VACUUM` and `ANALYZE` commands is currently disabled in HAWQ.
+
+Expired rows are held in what is called the *free space map*. The free space map must be sized large enough to cover the dead rows of all tables in your database. If not sized large enough, space occupied by dead rows that overflow the free space map cannot be reclaimed by a regular `VACUUM` command.
+
+`VACUUM FULL` will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed HAWQ tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a `CREATE TABLE AS` statement and drop the old table.
+
+`VACUUM FULL` is not recommended in HAWQ. It is best to size the free space map appropriately. The free space map is configured with the following server configuration parameters:
+
+-   `max_fsm_pages`
+-   `max_fsm_relations`
+
+## <a id="topic1__section7"></a>Examples
+
+Vacuum all tables in the current database:
+
+``` sql
+VACUUM;
+```
+
+Vacuum a specific table only:
+
+``` sql
+VACUUM mytable;
+```
+
+Vacuum all tables in the current database and collect statistics for the query planner:
+
+``` sql
+VACUUM ANALYZE;
+```
+
+## <a id="topic1__section8"></a>Compatibility
+
+There is no `VACUUM` statement in the SQL standard.
+
+## <a id="topic1__section9"></a>See Also
+
+[ANALYZE](ANALYZE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/toolkit/hawq_toolkit.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/toolkit/hawq_toolkit.html.md.erb b/reference/toolkit/hawq_toolkit.html.md.erb
new file mode 100644
index 0000000..f76963c
--- /dev/null
+++ b/reference/toolkit/hawq_toolkit.html.md.erb
@@ -0,0 +1,263 @@
+---
+title: The hawq_toolkit Administrative Schema
+---
+
+This section provides a reference on the `hawq_toolkit` administrative schema.
+
+HAWQ provides an administrative schema called `hawq_toolkit` that you can use to query the system catalogs, log files, and operating environment for system status information. The `hawq_toolkit` schema contains a number of views that you can access using SQL commands. The `hawq_toolkit` schema is accessible to all database users, although some objects may require superuser permissions.
+
+This documentation describes the most useful views in `hawq_toolkit`. You may notice other objects (views, functions, and external tables) within the `hawq_toolkit` schema that are not described in this documentation (these are supporting objects to the views described in this section).
+
+**Warning:** Do not change database objects in the `hawq_toolkit` schema. Do not create database objects in the schema. Changes to objects in the schema might affect the accuracy of administrative information returned by schema objects.
+
+## <a id="topic2"></a>Checking for Tables that Need Routine Maintenance
+
+The following views can help identify tables that need routine table maintenance (`VACUUM` and/or `ANALYZE`).
+
+-   [hawq\_stats\_missing](#topic4)
+
+The `VACUUM` command reclaims disk space occupied by deleted or obsolete rows. Because of the MVCC transaction concurrency model used in HAWQ, data rows that are deleted or updated still occupy physical space on disk even though they are not visible to any new transactions. Expired rows increase table size on disk and eventually slow down scans of the table.
+
+**Note:** VACUUM FULL is not recommended in HAWQ. See [VACUUM](../sql/VACUUM.html#topic1).
+
+The `ANALYZE` command collects column-level statistics needed by the query optimizer. HAWQ uses a cost-based query optimizer that relies on database statistics. Accurate statistics allow the query optimizer to better estimate selectivity and the number of rows retrieved by a query operation in order to choose the most efficient query plan.
+
+### <a id="topic4"></a>hawq\_stats\_missing
+
+This view shows tables that do not have statistics and therefore may require an `ANALYZE` be run on the table.
+
+<a id="topic4__ie194266"></a>
+
+<span class="tablecap">Table 1. hawq\_stats\_missing view</span>
+
+| Column    | Description                                                                                                                                                                                                                                                                                                                                                                |
+|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| smischema | Schema name.                                                                                                                                                                                                                                                                                                                                                               |
+| smitable  | Table name.                                                                                                                                                                                                                                                                                                                                                                |
+| smisize   | Does this table have statistics? False if the table does not have row count and row sizing statistics recorded in the system catalog, which may indicate that the table needs to be analyzed. This will also be false if the table does not contain any rows. For example, the parent tables of partitioned tables are always empty and will always return a false result. |
+| smicols   | Number of columns in the table.                                                                                                                                                                                                                                                                                                                                            |
+| smirecs   | Number of rows in the table.                                                                                                                                                                                                                                                                                                                                               |
+
+
+## <a id="topic16"></a>Viewing HAWQ Server Log Files
+
+Each component of a HAWQ system (master, standby master, and segments) keeps its own server log files. The `hawq_log_*` family of views allows you to issue SQL queries against the server log files to find particular entries of interest. The use of these views requires superuser permissions.
+
+-   [hawq\_log\_command\_timings](#topic17)
+-   [hawq\_log\_master\_concise](#topic19)
+
+### <a id="topic17"></a>hawq\_log\_command\_timings
+
+This view uses an external table to read the log files on the master and report the execution time of SQL commands executed in a database session. The use of this view requires superuser permissions.
+
+<a id="topic17__ie176169"></a>
+
+<span class="tablecap">Table 2. hawq\_log\_command\_timings view</span>
+
+| Column      | Description                                                |
+|-------------|------------------------------------------------------------|
+| logsession  | The session identifier (prefixed with "con").              |
+| logcmdcount | The command number within a session (prefixed with "cmd"). |
+| logdatabase | The name of the database.                                  |
+| loguser     | The name of the database user.                             |
+| logpid      | The process id (prefixed with "p").                        |
+| logtimemin  | The time of the first log message for this command.        |
+| logtimemax  | The time of the last log message for this command.         |
+| logduration | Statement duration from start to end time.                 |
+
+
+### <a id="topic19"></a>hawq\_log\_master\_concise
+
+This view uses an external table to read a subset of the log fields from the master log file. The use of this view requires superuser permissions.
+
+<a id="topic19__ie177543"></a>
+
+<span class="tablecap">Table 3. hawq\_log\_master\_concise view</span>
+
+| Column      | Description                                                |
+|-------------|------------------------------------------------------------|
+| logtime     | The timestamp of the log message.                          |
+| logdatabase | The name of the database.                                  |
+| logsession  | The session identifier (prefixed with "con").              |
+| logcmdcount | The command number within a session (prefixed with "cmd"). |
+| logseverity | The severity level for the record.                         |
+| logmessage  | Log or error message text.                                 |
+
+
+## <a id="topic38"></a>Checking Database Object Sizes and Disk Space
+
+The `hawq_size_*` family of views can be used to determine the disk space usage for a distributed HAWQ, schema, table, or index. The following views calculate the total size of an object across all segments.
+
+-   [hawq\_size\_of\_all\_table\_indexes](#topic39)
+-   [hawq\_size\_of\_database](#topic40)
+-   [hawq\_size\_of\_index](#topic41)
+-   [hawq\_size\_of\_partition\_and\_indexes\_disk](#topic42)
+-   [hawq\_size\_of\_schema\_disk](#topic43)
+-   [hawq\_size\_of\_table\_and\_indexes\_disk](#topic44)
+-   [hawq\_size\_of\_table\_and\_indexes\_licensing](#topic45)
+-   [hawq\_size\_of\_table\_disk](#topic46)
+-   [hawq\_size\_of\_table\_uncompressed](#topic47)
+
+The table and index sizing views list the relation by object ID (not by name). To check the size of a table or index by name, you must look up the relation name (`relname`) in the `pg_class` table. For example:
+
+``` pre
+SELECT relname as name, sotdsize as size, sotdtoastsize as 
+toast, sotdadditionalsize as other 
+FROM gp_size_of_table_disk as sotd, pg_class 
+WHERE sotd.sotdoid=pg_class.oid ORDER BY relname;
+```
+
+### <a id="topic39"></a>hawq\_size\_of\_all\_table\_indexes
+
+This view shows the total size of all indexes for a table. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.
+
+<a id="topic39__ie181657"></a>
+
+<span class="tablecap">Table 4. hawq\_size\_of\_all\_table\_indexes view</span>
+
+| Column          | Description                                  |
+|-----------------|----------------------------------------------|
+| soatioid        | The object ID of the table                   |
+| soatisize       | The total size of all table indexes in bytes |
+| soatischemaname | The schema name                              |
+| soatitablename  | The table name                               |
+
+
+### <a id="topic40"></a>hawq\_size\_of\_database
+
+This view shows the total size of a database. This view is accessible to all users, however non-superusers will only be able to see databases that they have permission to access.
+
+<a id="topic40__ie181758"></a>
+
+<span class="tablecap">Table 5. hawq\_size\_of\_database view</span>
+
+| Column      | Description                       |
+|-------------|-----------------------------------|
+| sodddatname | The name of the database          |
+| sodddatsize | The size of the database in bytes |
+
+
+### <a id="topic41"></a>hawq\_size\_of\_index
+
+This view shows the total size of an index. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.
+
+<a id="topic41__ie181709"></a>
+
+<span class="tablecap">Table 6. hawq\_size\_of\_index view</span>
+
+| Column             | Description                                           |
+|--------------------|-------------------------------------------------------|
+| soioid             | The object ID of the index                            |
+| soitableoid        | The object ID of the table to which the index belongs |
+| soisize            | The size of the index in bytes                        |
+| soiindexschemaname | The name of the index schema                          |
+| soiindexname       | The name of the index                                 |
+| soitableschemaname | The name of the table schema                          |
+| soitablename       | The name of the table                                 |
+
+
+### <a id="topic42"></a>hawq\_size\_of\_partition\_and\_indexes\_disk
+
+This view shows the size on disk of partitioned child tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.
+
+<a id="topic42__ie181803"></a>
+
+<span class="tablecap">Table 7. hawq\_size\_of\_partition\_and\_indexes\_disk view</span>
+
+| Column                     | Description                                     |
+|----------------------------|-------------------------------------------------|
+| sopaidparentoid            | The object ID of the parent table               |
+| sopaidpartitionoid         | The object ID of the partition table            |
+| sopaidpartitiontablesize   | The partition table size in bytes               |
+| sopaidpartitionindexessize | The total size of all indexes on this partition |
+| Sopaidparentschemaname     | The name of the parent schema                   |
+| Sopaidparenttablename      | The name of the parent table                    |
+| Sopaidpartitionschemaname  | The name of the partition schema                |
+| sopaidpartitiontablename   | The name of the partition table                 |
+
+
+### <a id="topic43"></a>hawq\_size\_of\_schema\_disk
+
+This view shows schema sizes for the public schema and the user-created schemas in the current database. This view is accessible to all users, however non-superusers will be able to see only the schemas that they have permission to access.
+
+<a id="topic43__ie183105"></a>
+
+<span class="tablecap">Table 8. hawq\_size\_of\_schema\_disk view</span>
+
+| Column              | Description                                      |
+|---------------------|--------------------------------------------------|
+| sosdnsp             | The name of the schema                           |
+| sosdschematablesize | The total size of tables in the schema in bytes  |
+| sosdschemaidxsize   | The total size of indexes in the schema in bytes |
+
+
+### <a id="topic44"></a>hawq\_size\_of\_table\_and\_indexes\_disk
+
+This view shows the size on disk of tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.
+
+<a id="topic44__ie183128"></a>
+
+<span class="tablecap">Table 9. hawq\_size\_of\_table\_and\_indexes\_disk view</span>
+
+| Column           | Description                                |
+|------------------|--------------------------------------------|
+| sotaidoid        | The object ID of the parent table          |
+| sotaidtablesize  | The disk size of the table                 |
+| sotaididxsize    | The total size of all indexes on the table |
+| sotaidschemaname | The name of the schema                     |
+| sotaidtablename  | The name of the table                      |
+
+
+### <a id="topic45"></a>hawq\_size\_of\_table\_and\_indexes\_licensing
+
+This view shows the total size of tables and their indexes for licensing purposes. The use of this view requires superuser permissions.
+
+<a id="topic45__ie181949"></a>
+
+<span class="tablecap">Table 10. hawq\_size\_of\_table\_and\_indexes\_licensing view</span>
+
+| Column                      | Description                                                                                 |
+|-----------------------------|---------------------------------------------------------------------------------------------|
+| sotailoid                   | The object ID of the table                                                                  |
+| sotailtablesizedisk         | The total disk size of the table                                                            |
+| sotailtablesizeuncompressed | If the table is a compressed append-only table, shows the uncompressed table size in bytes. |
+| sotailindexessize           | The total size of all indexes in the table                                                  |
+| sotailschemaname            | The schema name                                                                             |
+| sotailtablename             | The table name                                                                              |
+
+
+### <a id="topic46"></a>hawq\_size\_of\_table\_disk
+
+This view shows the size of a table on disk. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access
+
+<a id="topic46__ie183408"></a>
+
+<span class="tablecap">Table 11. hawq\_size\_of\_table\_disk view</span>
+
+| Column             | Description                                                                                                                                                                                          |
+|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| sotdoid            | The object ID of the table                                                                                                                                                                           |
+| sotdsize           | The size of the table in bytes. The size is only the main table size. The size does not include auxiliary objects such as oversized (toast) attributes, or additional storage objects for AO tables. |
+| sotdtoastsize      | The size of the TOAST table (oversized attribute storage), if there is one.                                                                                                                          |
+| sotdadditionalsize | Reflects the segment and block directory table sizes for append-only (AO) tables.                                                                                                                    |
+| sotdschemaname     | The schema name                                                                                                                                                                                      |
+| sotdtablename      | The table name                                                                                                                                                                                       |
+
+
+### <a id="topic47"></a>hawq\_size\_of\_table\_uncompressed
+
+This view shows the uncompressed table size for append-only (AO) tables. Otherwise, the table size on disk is shown. The use of this view requires superuser permissions.
+
+<a id="topic47__ie183582"></a>
+
+<span class="tablecap">Table 12. hawq\_size\_of\_table\_uncompressed view</span>
+
+| Column         | Description                                                                                                   |
+|----------------|---------------------------------------------------------------------------------------------------------------|
+| sotuoid        | The object ID of the table                                                                                    |
+| sotusize       | The uncomressed size of the table in bytes if it is a compressed AO table. Otherwise, the table size on disk. |
+| sotuschemaname | The schema name                                                                                               |
+| sotutablename  | The table name                                                                                                |
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/requirements/system-requirements.html.md.erb
----------------------------------------------------------------------
diff --git a/requirements/system-requirements.html.md.erb b/requirements/system-requirements.html.md.erb
new file mode 100644
index 0000000..bc08ef0
--- /dev/null
+++ b/requirements/system-requirements.html.md.erb
@@ -0,0 +1,197 @@
+---
+title: Apache HAWQ System Requirements
+---
+
+Follow these guidelines to configure each host machine that will run an Apache HAWQ or PXF service.
+
+
+## <a id="topic_d3f_vlz_g5"></a>Host Memory Configuration
+
+In order to prevent data loss or corruption in an Apache HAWQ cluster, you must configure the memory on each host machine so that the Linux Out-of-Memory \(OOM\) killer process never kills a HAWQ process due to OOM conditions. \(HAWQ applies its own rules to enforce memory restrictions.\)
+
+**For mission critical deployments of HAWQ, perform these steps on each host machine to configure memory:**
+
+1.  Set the operating system `vm.overcommit_memory` parameter to 2. With this setting, the OOM killer process reports an error instead of killing running processes. To set this parameter:
+    1.  Open the `/etc/sysctl.conf` file with a text editor.
+    2.  Add or change the parameter definition so that the file includes these lines:
+
+        ```
+        kernel.threads-max=798720
+        vm.overcommit_memory=2
+        ```
+
+    3.  Save and close the file, then execute this command to apply your change:
+
+        ``` shell
+        $ sysctl -p
+        ```
+
+    4.  To view the current `vm.overcommit_memory` setting, execute the command:
+
+        ``` shell
+        $ sysctl -a | grep overcommit_memory
+        ```
+
+    5.  To view the runtime overcommit settings, execute the command:
+
+        ``` shell
+        $ cat /proc/meminfo | grep Commit
+        ```
+
+2.  Set the Linux swap space size and `vm.overcommit_ratio` parameter according to the available memory on each host. For hosts having 2GB-8GB of memory, set swap space = physical RAM and set `vm.overcommit_ratio=50`. For hosts having more than 8GB up to 64GB of memory, set swap space = 0.5 \* physical RAM and set `vm.overcommit_ratio=50`. For hosts having more than 64GB memory, set swap space = 4GB and set `vm.overcommit_ratio=100`
+
+    To set the `vm.overcommit_ratio` parameter:
+
+    1.  Open the `/etc/sysctl.conf` file with a text editor.
+    2.  Add or change the parameter definition so that the file includes the line:
+
+           ```
+       vm.overcommit_ratio=50
+       ```
+
+           \(Use `vm.overcommit_ratio=100` for hosts with more than 64GB RAM.\)
+    3.  Save and close the file, then execute this command to apply your change:
+
+        ``` shell
+        $ sysctl -p
+        ```
+
+    4.  To view the current `vm.overcommit_ratio` setting, execute the command:
+
+        ``` shell
+        $ sysctl -a | grep overcommit_ratio
+        ```
+        You can choose to use a dedicated swap partition, a swap file, or a combination of both. View the current swap settings using the command:
+
+        ``` shell
+        $ cat /proc/meminfo | grep Swap
+        ```
+3.  Ensure that all Java services that run on the machine use the `-Xmx` switch to allocate only their required heap.
+4.  Ensure that no other services \(such as Puppet\) or automated processes attempt to reset the overcommit settings on cluster hosts.
+5.  During the installation process, configure HAWQ memory by setting YARN or HAWQ configuration parameters, as described in [HAWQ Memory Configuration](#topic_uzf_flz_g5).
+
+## <a id="topic_uzf_flz_g5"></a>HAWQ Memory Configuration
+
+You must configure the memory used by HAWQ according to whether you plan to use YARN or HAWQ to manage system resources.
+
+After you configure the `vm.overcommit_ratio` and swap space according to [Host Memory Configuration](#topic_d3f_vlz_g5), the total memory available to a Linux host machine can be represented by the equation:
+
+```
+TOTAL_MEMORY = RAM * overcommit_ratio_percentage + SWAP
+```
+
+`TOTAL_MEMORY` comprises both HAWQ memory and `NON_HAWQ_MEMORY`, which is the memory used by components such as:
+
+-   Operating system
+-   DataNode
+-   NodeManager
+-   PXF
+-   All other software you run on the host machine.
+
+To configure the HAWQ memory for a given host, first determine the amount of `NON_HAWQ_MEMORY` that is used on the machine. Then configure HAWQ memory by setting the correct parameter according to whether you use the HAWQ default resource manager or YARN to manage resources:
+
+-   If you are using YARN for resource management, set `yarn.nodemanager.resource.memory-mb` to the smaller of `TOTAL_MEMORY - NON_HAWQ_MEMORY` or `RAM`.
+-   If you are using the HAWQ default resource manager, set `hawq_rm_memory_limit_perseg = RAM - NON_HAWQ_MEMORY`.
+
+You can set either parameter using Ambari when configuring YARN or when installing HAWQ with Ambari.
+
+### Example 1 - Large Host Machine
+
+An example large host machine uses the memory configuration:
+
+>RAM: 256GB
+>
+>SWAP: 4GB
+
+>NON\_HAWQ\_MEMORY:
+
+>> 2GB for Operating System
+
+>> 2GB for DataNode
+
+>> 2GB for NodeManager
+
+>> 1GB for PXF
+
+>overcommit\_ratio\_percentage:1 \(`vm.overcommit_ratio` = 100\)
+
+For this machine, `TOTAL_MEMORY = 256GB * 1 + 4GB = 260GB`.
+
+If this system uses YARN for resource management, you would set `yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 260GB - 7GB = 253 \(because 253GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set `hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 256 GB - 7GB = 249.
+
+### Example 2 - Medium Host Machine
+
+An example medium host machine uses the memory configuration:
+
+>RAM: 64GB
+
+>SWAP: 32GB
+
+>NON\_HAWQ\_MEMORY:
+
+>>2GB for Operating System
+
+>>2GB for DataNode
+
+>>2GB for NodeManager
+
+>>1GB for PXF
+
+>overcommit\_ratio\_percentage: .5 \(`vm.overcommit_ratio` = 50\)
+
+For this machine, `TOTAL_MEMORY = 64GB * .5 + 32GB = 64GB`.
+
+If this system uses YARN for resource management, you would set `yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 64GB - 7GB = 57 \(because 57GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set `hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 64 GB - 11GB = 57.
+
+### Example 3 - Small Host Machine \(Not recommended for production use\)
+
+An example small machine uses the memory configuration:
+
+>RAM: 8GB
+
+>SWAP: 8GB
+
+>NON\_HAWQ\_MEMORY:
+
+>>2GB for Operating System
+
+>>2GB for DataNode
+
+>>2GB for NodeManager
+
+>>1GB for PXF
+
+>overcommit\_ratio\_percentage:  .5 \(`vm.overcommit_ratio` = 50\)
+
+For this machine, `TOTAL_MEMORY = 8GB * .5 + 8GB = 12GB`.
+
+If this system uses YARN for resource management, you would set `yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 12GB - 7GB = 5 \(because 5GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set `hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 8 GB - 7GB = 1.
+
+## <a id="topic_bsm_hhv_2v"></a>Disk Requirements
+
+-   2GB per host for HAWQ installation. 
+-   Approximately 300MB per segment instance for metadata.
+-   Multiple large (2TB or greater) disks are recommended for HAWQ master and segment temporary directories. For a given query, HAWQ will use a separate temp directory (if available) for each virtual segment to store spill files. Multiple HAWQ sessions will also use separate temp directories where available to avoid disk contention. If you configure too few temp directories, or you place multiple temp directories on the same disk, you increase the risk of disk contention or running out of disk space when multiple virtual segments target the same disk. Each HAWQ segment node can have 6 virtual segments.  
+-   Appropriate free space for data: disks should have at least 30% free space \(no more than 70% capacity\).
+-   High-speed, local storage
+
+## <a id="topic_rdb_jhv_2v"></a>Network Requirements
+
+-   Gigabit Ethernet within the array. For a production cluster, 10 Gigabit Ethernet recommended.
+-   Dedicated, non-blocking switch.
+-   Systems with multiple NICs require NIC bonding to utilize all available network bandwidth.
+
+
+## <a id="port-req"></a>Port Requirements
+Individual PXF plug-ins, which you install after adding the HAWQ and PXF services, require that you Tomcat on the host machine. Tomcat reserves ports 8005, 8080, and 8009.
+
+If you have configured Oozie JXM reporting on a host that will run a PXF plug-in, make sure that the reporting service uses a port other than 8005. This helps to prevent port conflict errors from occurring when you start the PXF service.
+
+## <a id="umask"></a>Umask Requirement
+Set the OS file system umask to 022 on all cluster hosts. This ensure that users can read the HDFS block files.


Mime
View raw message