hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [28/51] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs
Date Fri, 06 Jan 2017 17:32:43 GMT
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/defining-queries.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/defining-queries.html.md.erb b/markdown/query/defining-queries.html.md.erb
new file mode 100644
index 0000000..b796511
--- /dev/null
+++ b/markdown/query/defining-queries.html.md.erb
@@ -0,0 +1,528 @@
+---
+title: Defining Queries
+---
+
+HAWQ is based on the PostgreSQL implementation of the SQL standard. SQL commands are typically entered using the standard PostgreSQL interactive terminal `psql`, but other programs that have similar functionality can be used as well.
+
+
+## <a id="topic3"></a>SQL Lexicon
+
+SQL is a standard language for accessing databases. The language consists of elements that enable data storage, retrieval, analysis, viewing, and so on. You use SQL commands to construct queries and commands that the HAWQ engine understands.
+
+SQL queries consist of a sequence of commands. Commands consist of a sequence of valid tokens in correct syntax order, terminated by a semicolon (`;`).
+
+H uses PostgreSQL's structure and syntax, with some exceptions. For more information about SQL rules and concepts in PostgreSQL, see "SQL Syntax" in the PostgreSQL documentation.
+
+## <a id="topic4"></a>SQL Value Expressions
+
+SQL value expressions consist of one or more values, symbols, operators, SQL functions, and data. The expressions compare data or perform calculations and return a value as the result. Calculations include logical, arithmetic, and set operations.
+
+The following are value expressions:
+
+-   Aggregate expressions
+-   Array constructors
+-   Column references
+-   Constant or literal values
+-   Correlated subqueries
+-   Field selection expressions
+-   Function calls
+-   New column values in an `INSERT`
+-   Operator invocation column references
+-   Positional parameter references, in the body of a function definition or prepared statement
+-   Row constructors
+-   Scalar subqueries
+-   Search conditions in a `WHERE` clause
+-   Target lists of a `SELECT` command
+-   Type casts
+-   Value expressions in parentheses, useful to group sub-expressions and override precedence
+-   Window expressions
+
+SQL constructs such as functions and operators are expressions but do not follow any general syntax rules. For more information about these constructs, see [Using Functions and Operators](functions-operators.html#topic26).
+
+### <a id="topic5"></a>Column References
+
+A column reference has the form:
+
+```
+correlation.columnname
+```
+
+Here, `correlation` is the name of a table (possibly qualified with a schema name) or an alias for a table defined with a `FROM` clause or one of the keywords `NEW` or `OLD`. `NEW` and `OLD` can appear only in rewrite rules, but you can use other correlation names in any SQL statement. If the column name is unique across all tables in the query, you can omit the "`correlation.`" part of the column reference.
+
+### <a id="topic6"></a>Positional Parameters
+
+Positional parameters are arguments to SQL statements or functions that you reference by their positions in a series of arguments. For example, `$1` refers to the first argument, `$2` to the second argument, and so on. The values of positional parameters are set from arguments external to the SQL statement or supplied when SQL functions are invoked. Some client libraries support specifying data values separately from the SQL command, in which case parameters refer to the out-of-line data values. A parameter reference has the form:
+
+```
+$number
+```
+
+For example:
+
+``` pre
+CREATE FUNCTION dept(text) RETURNS dept
+    AS $$ SELECT * FROM dept WHERE name = $1 $$
+    LANGUAGE SQL;
+```
+
+Here, the `$1` references the value of the first function argument whenever the function is invoked.
+
+### <a id="topic7"></a>Subscripts
+
+If an expression yields a value of an array type, you can extract a specific element of the array value as follows:
+
+``` pre
+expression[subscript]
+```
+
+You can extract multiple adjacent elements, called an array slice, as follows (including the brackets):
+
+``` pre
+expression[lower_subscript:upper_subscript]
+```
+
+Each subscript is an expression and yields an integer value.
+
+Array expressions usually must be in parentheses, but you can omit the parentheses when the expression to be subscripted is a column reference or positional parameter. You can concatenate multiple subscripts when the original array is multidimensional. For example (including the parentheses):
+
+``` pre
+mytable.arraycolumn[4]
+```
+
+``` pre
+mytable.two_d_column[17][34]
+```
+
+``` pre
+$1[10:42]
+```
+
+``` pre
+(arrayfunction(a,b))[42]
+```
+
+### <a id="topic8"></a>Field Selections
+
+If an expression yields a value of a composite type (row type), you can extract a specific field of the row as follows:
+
+```
+expression.fieldname
+```
+
+The row expression usually must be in parentheses, but you can omit these parentheses when the expression to be selected from is a table reference or positional parameter. For example:
+
+``` pre
+mytable.mycolumn
+```
+
+``` pre
+$1.somecolumn
+```
+
+``` pre
+(rowfunction(a,b)).col3
+```
+
+A qualified column reference is a special case of field selection syntax.
+
+### <a id="topic9"></a>Operator Invocations
+
+Operator invocations have the following possible syntaxes:
+
+``` pre
+expression operator expression(binary infix operator)
+```
+
+``` pre
+operator expression(unary prefix operator)
+```
+
+``` pre
+expression operator(unary postfix operator)
+```
+
+Where *operator* is an operator token, one of the key words `AND`, `OR`, or `NOT`, or qualified operator name in the form:
+
+``` pre
+OPERATOR(schema.operatorname)
+```
+
+Available operators and whether they are unary or binary depends on the operators that the system or user defines. For more information about built-in operators, see [Built-in Functions and Operators](functions-operators.html#topic29).
+
+### <a id="topic10"></a>Function Calls
+
+The syntax for a function call is the name of a function (possibly qualified with a schema name), followed by its argument list enclosed in parentheses:
+
+``` pre
+function ([expression [, expression ... ]])
+```
+
+For example, the following function call computes the square root of 2:
+
+``` pre
+sqrt(2)
+```
+
+### <a id="topic11"></a>Aggregate Expressions
+
+An aggregate expression applies an aggregate function across the rows that a query selects. An aggregate function performs a calculation on a set of values and returns a single value, such as the sum or average of the set of values. The syntax of an aggregate expression is one of the following:
+
+-   `aggregate_name ([ , ... ] ) [FILTER (WHERE                 condition)] ` — operates across all input rows for which the expected result value is non-null. `ALL` is the default.
+-   `aggregate_name(ALLexpression[ , ... ] ) [FILTER               (WHERE condition)]` — operates identically to the first form because `ALL` is the default
+-   `aggregate_name(DISTINCT expression[ , ... ] )               [FILTER (WHERE condition)]` — operates across all distinct non-null values of input rows
+-   `aggregate_name(*) [FILTER (WHERE               condition)]` — operates on all rows with values both null and non-null. Generally, this form is most useful for the `count(*)` aggregate function.
+
+Where *aggregate\_name* is a previously defined aggregate (possibly schema-qualified) and *expression* is any value expression that does not contain an aggregate expression.
+
+For example, `count(*)` yields the total number of input rows, `count(f1)` yields the number of input rows in which `f1` is <span class="ph">non-null, and </span>`count(distinct f1)` yields the number of distinct non-null values of `f1`.
+
+You can specify a condition with the `FILTER` clause to limit the input rows to the aggregate function. For example:
+
+``` sql
+SELECT count(*) FILTER (WHERE gender='F') FROM employee;
+```
+
+The `WHERE condition` of the `FILTER` clause cannot contain a set-returning function, subquery, window function, or outer reference. If you use a user-defined aggregate function, declare the state transition function as `STRICT` (see `CREATE AGGREGATE`).
+
+For predefined aggregate functions, see [Built-in Functions and Operators](functions-operators.html#topic29). You can also add custom aggregate functions.
+
+HAWQ provides the `MEDIAN` aggregate function, which returns the fiftieth percentile of the `PERCENTILE_CONT` result and special aggregate expressions for inverse distribution functions as follows:
+
+``` sql
+PERCENTILE_CONT(_percentage_) WITHIN GROUP (ORDER BY _expression_)
+```
+
+``` sql
+PERCENTILE_DISC(_percentage_) WITHIN GROUP (ORDER BY _expression_)
+```
+
+Currently you can use only these two expressions with the keyword `WITHIN             GROUP`.
+
+#### <a id="topic12"></a>Limitations of Aggregate Expressions
+
+The following are current limitations of the aggregate expressions:
+
+-   HAWQ does not support the following keywords: ALL, DISTINCT, FILTER and OVER. See [Advanced Aggregate Functions](functions-operators.html#topic31__in2073121) for more details.
+-   An aggregate expression can appear only in the result list or HAVING clause of a SELECT command. It is forbidden in other clauses, such as WHERE, because those clauses are logically evaluated before the results of aggregates form. This restriction applies to the query level to which the aggregate belongs.
+-   When an aggregate expression appears in a subquery, the aggregate is normally evaluated over the rows of the subquery. If the aggregate's arguments contain only outer-level variables, the aggregate belongs to the nearest such outer level and evaluates over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery in which it appears, and the aggregate expression acts as a constant over any one evaluation of that subquery. See [Scalar Subqueries](#topic15) and [Built-in functions and operators](functions-operators.html#topic29__in204913).
+-   HAWQ does not support DISTINCT with multiple input expressions.
+
+### <a id="topic13"></a>Window Expressions
+
+Window expressions allow application developers to more easily compose complex online analytical processing (OLAP) queries using standard SQL commands. For example, with window expressions, users can calculate moving averages or sums over various intervals, reset aggregations and ranks as selected column values change, and express complex ratios in simple terms.
+
+A window expression represents the application of a *window function* applied to a *window frame*, which is defined in a special `OVER()` clause. A window partition is a set of rows that are grouped together to apply a window function. Unlike aggregate functions, which return a result value for each group of rows, window functions return a result value for every row, but that value is calculated with respect to the rows in a particular window partition. If no partition is specified, the window function is computed over the complete intermediate result set.
+
+The syntax of a window expression is:
+
+``` pre
+window_function ( [expression [, ...]] ) OVER ( window_specification )
+```
+
+Where *`window_function`* is one of the functions listed in [Window functions](functions-operators.html#topic30__in164369), *`expression`* is any value expression that does not contain a window expression, and *`window_specification`* is:
+
+```
+[window_name]
+[PARTITION BY expression [, ...]]
+[[ORDER BY expression [ASC | DESC | USING operator] [, ...]
+    [{RANGE | ROWS} 
+       { UNBOUNDED PRECEDING
+       | expression PRECEDING
+       | CURRENT ROW
+       | BETWEEN window_frame_bound AND window_frame_bound }]]
+```
+
+and where `window_frame_bound` can be one of:
+
+``` 
+    UNBOUNDED PRECEDING
+    expression PRECEDING
+    CURRENT ROW
+    expression FOLLOWING
+    UNBOUNDED FOLLOWING
+```
+
+A window expression can appear only in the select list of a `SELECT` command. For example:
+
+``` sql
+SELECT count(*) OVER(PARTITION BY customer_id), * FROM sales;
+```
+
+The `OVER` clause differentiates window functions from other aggregate or reporting functions. The `OVER` clause defines the *`window_specification`* to which the window function is applied. A window specification has the following characteristics:
+
+-   The `PARTITION BY` clause defines the window partitions to which the window function is applied. If omitted, the entire result set is treated as one partition.
+-   The `ORDER BY` clause defines the expression(s) for sorting rows within a window partition. The `ORDER BY` clause of a window specification is separate and distinct from the `ORDER BY` clause of a regular query expression. The `ORDER BY` clause is required for the window functions that calculate rankings, as it identifies the measure(s) for the ranking values. For OLAP aggregations, the `ORDER BY` clause is required to use window frames (the `ROWS` | `RANGE` clause).
+
+**Note:** Columns of data types without a coherent ordering, such as `time`, are not good candidates for use in the `ORDER BY` clause of a window specification. `Time`, with or without a specified time zone, lacks a coherent ordering because addition and subtraction do not have the expected effects. For example, the following is not generally true: `x::time < x::time +             '2 hour'::interval`
+
+-   The `ROWS/RANGE` clause defines a window frame for aggregate (non-ranking) window functions. A window frame defines a set of rows within a window partition. When a window frame is defined, the window function computes on the contents of this moving frame rather than the fixed contents of the entire window partition. Window frames are row-based (`ROWS`) or value-based (`RANGE`).
+
+### <a id="topic14"></a>Type Casts
+
+A type cast specifies a conversion from one data type to another. HAWQ accepts two equivalent syntaxes for type casts:
+
+``` sql
+CAST ( expression AS type )
+expression::type
+```
+
+The `CAST` syntax conforms to SQL; the syntax with `::` is historical PostgreSQL usage.
+
+A cast applied to a value expression of a known type is a run-time type conversion. The cast succeeds only if a suitable type conversion function is defined. This differs from the use of casts with constants. A cast applied to a string literal represents the initial assignment of a type to a literal constant value, so it succeeds for any type if the contents of the string literal are acceptable input syntax for the data type.
+
+You can usually omit an explicit type cast if there is no ambiguity about the type a value expression must produce; for example, when it is assigned to a table column, the system automatically applies a type cast. The system applies automatic casting only to casts marked "OK to apply implicitly" in system catalogs. Other casts must be invoked with explicit casting syntax to prevent unexpected conversions from being applied without the user's knowledge.
+
+### <a id="topic15"></a>Scalar Subqueries
+
+A scalar subquery is a `SELECT` query in parentheses that returns exactly one row with one column. Do not use a `SELECT` query that returns multiple rows or columns as a scalar subquery. The query runs and uses the returned value in the surrounding value expression. A correlated scalar subquery contains references to the outer query block.
+
+### <a id="topic16"></a>Correlated Subqueries
+
+A correlated subquery (CSQ) is a `SELECT` query with a `WHERE` clause or target list that contains references to the parent outer clause. CSQs efficiently express results in terms of results of another query. HAWQ supports correlated subqueries that provide compatibility with many existing applications. A CSQ is a scalar or table subquery, depending on whether it returns one or multiple rows. HAWQ does not support correlated subqueries with skip-level correlations.
+
+### <a id="topic17"></a>Correlated Subquery Examples
+
+#### <a id="topic18"></a>Example 1 – Scalar correlated subquery
+
+``` sql
+SELECT * FROM t1 WHERE t1.x 
+> (SELECT MAX(t2.x) FROM t2 WHERE t2.y = t1.y);
+```
+
+#### <a id="topic19"></a>Example 2 – Correlated EXISTS subquery
+
+``` sql
+SELECT * FROM t1 WHERE 
+EXISTS (SELECT 1 FROM t2 WHERE t2.x = t1.x);
+```
+
+HAWQ uses one of the following methods to run CSQs:
+
+-   Unnest the CSQ into join operations – This method is most efficient, and it is how HAWQ runs most CSQs, including queries from the TPC-H benchmark.
+-   Run the CSQ on every row of the outer query – This method is relatively inefficient, and it is how HAWQ runs queries that contain CSQs in the `SELECT` list or are connected by `OR` conditions.
+
+The following examples illustrate how to rewrite some of these types of queries to improve performance.
+
+#### <a id="topic20"></a>Example 3 - CSQ in the Select List
+
+*Original Query*
+
+``` sql
+SELECT T1.a,
+(SELECT COUNT(DISTINCT T2.z) FROM t2 WHERE t1.x = t2.y) dt2 
+FROM t1;
+```
+
+Rewrite this query to perform an inner join with `t1` first and then perform a left join with `t1` again. The rewrite applies for only an equijoin in the correlated condition.
+
+*Rewritten Query*
+
+``` sql
+SELECT t1.a, dt2 FROM t1 
+LEFT JOIN 
+(SELECT t2.y AS csq_y, COUNT(DISTINCT t2.z) AS dt2 
+FROM t1, t2 WHERE t1.x = t2.y 
+GROUP BY t1.x) 
+ON (t1.x = csq_y);
+```
+
+### <a id="topic21"></a>Example 4 - CSQs connected by OR Clauses
+
+*Original Query*
+
+``` sql
+SELECT * FROM t1 
+WHERE 
+x > (SELECT COUNT(*) FROM t2 WHERE t1.x = t2.x) 
+OR x < (SELECT COUNT(*) FROM t3 WHERE t1.y = t3.y)
+```
+
+Rewrite this query to separate it into two parts with a union on the `OR` conditions.
+
+*Rewritten Query*
+
+``` sql
+SELECT * FROM t1 
+WHERE x > (SELECT count(*) FROM t2 WHERE t1.x = t2.x) 
+UNION 
+SELECT * FROM t1 
+WHERE x < (SELECT count(*) FROM t3 WHERE t1.y = t3.y)
+```
+
+To view the query plan, use `EXPLAIN SELECT` or `EXPLAIN ANALYZE             SELECT`. Subplan nodes in the query plan indicate that the query will run on every row of the outer query, and the query is a candidate for rewriting. For more information about these statements, see [Query Profiling](query-profiling.html#topic39).
+
+### <a id="topic22"></a>Advanced Table Functions
+
+HAWQ supports table functions with `TABLE` value expressions. You can sort input rows for advanced table functions with an `ORDER BY` clause. You can redistribute them with a `SCATTER BY` clause to specify one or more columns or an expression for which rows with the specified characteristics are available to the same process. This usage is similar to using a `DISTRIBUTED BY` clause when creating a table, but the redistribution occurs when the query runs.
+
+**Note:**
+Based on the distribution of data, HAWQ automatically parallelizes table functions with `TABLE` value parameters over the nodes of the cluster.
+
+### <a id="topic23"></a>Array Constructors
+
+An array constructor is an expression that builds an array value from values for its member elements. A simple array constructor consists of the key word `ARRAY`, a left square bracket `[`, one or more expressions separated by commas for the array element values, and a right square bracket `]`. For example,
+
+``` sql
+SELECT ARRAY[1,2,3+4];
+```
+
+```
+  array
+---------
+ {1,2,7}
+```
+
+The array element type is the common type of its member expressions, determined using the same rules as for `UNION` or `CASE` constructs.
+
+You can build multidimensional array values by nesting array constructors. In the inner constructors, you can omit the keyword `ARRAY`. For example, the following two `SELECT` statements produce the same result:
+
+``` sql
+SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
+SELECT ARRAY[[1,2],[3,4]];
+```
+
+```
+     array
+---------------
+ {{1,2},{3,4}}
+```
+
+Since multidimensional arrays must be rectangular, inner constructors at the same level must produce sub-arrays of identical dimensions.
+
+Multidimensional array constructor elements are not limited to a sub-`ARRAY` construct; they are anything that produces an array of the proper kind. For example:
+
+``` sql
+CREATE TABLE arr(f1 int[], f2 int[]);
+INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], 
+ARRAY[[5,6],[7,8]]);
+SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
+```
+
+```
+                     array
+------------------------------------------------
+ {{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}
+```
+
+You can construct an array from the results of a subquery. Write the array constructor with the keyword `ARRAY` followed by a subquery in parentheses. For example:
+
+``` sql
+SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
+```
+
+```
+                          ?column?
+-----------------------------------------------------------
+ {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}
+```
+
+The subquery must return a single column. The resulting one-dimensional array has an element for each row in the subquery result, with an element type matching that of the subquery's output column. The subscripts of an array value built with `ARRAY` always begin with `1`.
+
+### <a id="topic24"></a>Row Constructors
+
+A row constructor is an expression that builds a row value (also called a composite value) from values for its member fields. For example,
+
+``` sql
+SELECT ROW(1,2.5,'this is a test');
+```
+
+Row constructors have the syntax `rowvalue.*`, which expands to a list of the elements of the row value, as when you use the syntax `.*` at the top level of a `SELECT` list. For example, if table `t` has columns `f1` and `f2`, the following queries are the same:
+
+``` sql
+SELECT ROW(t.*, 42) FROM t;
+SELECT ROW(t.f1, t.f2, 42) FROM t;
+```
+
+By default, the value created by a `ROW` expression has an anonymous record type. If necessary, it can be cast to a named composite type — either the row type of a table, or a composite type created with `CREATE TYPE AS`. To avoid ambiguity, you can explicitly cast the value if necessary. For example:
+
+``` sql
+CREATE TABLE mytable(f1 int, f2 float, f3 text);
+CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' 
+LANGUAGE SQL;
+```
+
+In the following query, you do not need to cast the value because there is only one `getf1()` function and therefore no ambiguity:
+
+``` sql
+SELECT getf1(ROW(1,2.5,'this is a test'));
+```
+
+```
+ getf1
+-------
+     1
+```
+
+``` sql
+CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
+CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT 
+$1.f1' LANGUAGE SQL;
+```
+
+Now we need a cast to indicate which function to call:
+
+``` sql
+SELECT getf1(ROW(1,2.5,'this is a test'));
+```
+```
+ERROR:  function getf1(record) is not unique
+```
+
+``` sql
+SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
+```
+
+```
+ getf1
+-------
+     1
+```
+
+``` sql
+SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
+```
+
+```
+ getf1
+-------
+    11
+```
+
+You can use row constructors to build composite values to be stored in a composite-type table column or to be passed to a function that accepts a composite parameter.
+
+### <a id="topic25"></a>Expression Evaluation Rules
+
+The order of evaluation of subexpressions is undefined. The inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
+
+If you can determine the result of an expression by evaluating only some parts of the expression, then other subexpressions might not be evaluated at all. For example, in the following expression:
+
+``` sql
+SELECT true OR somefunc();
+```
+
+`somefunc()` would probably not be called at all. The same is true in the following expression:
+
+``` sql
+SELECT somefunc() OR true;
+```
+
+This is not the same as the left-to-right evaluation order that Boolean operators enforce in some programming languages.
+
+Do not use functions with side effects as part of complex expressions, especially in `WHERE` and `HAVING` clauses, because those clauses are extensively reprocessed when developing an execution plan. Boolean expressions (`AND`/`OR`/`NOT` combinations) in those clauses can be reorganized in any manner that Boolean algebra laws allow.
+
+Use a `CASE` construct to force evaluation order. The following example is an untrustworthy way to avoid division by zero in a `WHERE` clause:
+
+``` sql
+SELECT ... WHERE x <> 0 AND y/x > 1.5;
+```
+
+The following example shows a trustworthy evaluation order:
+
+``` sql
+SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false 
+END;
+```
+
+This `CASE` construct usage defeats optimization attempts; use it only when necessary.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/functions-operators.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/functions-operators.html.md.erb b/markdown/query/functions-operators.html.md.erb
new file mode 100644
index 0000000..8f14ee6
--- /dev/null
+++ b/markdown/query/functions-operators.html.md.erb
@@ -0,0 +1,437 @@
+---
+title: Using Functions and Operators
+---
+
+HAWQ evaluates functions and operators used in SQL expressions.
+
+## <a id="topic27"></a>Using Functions in HAWQ
+
+In HAWQ, functions can only be run on master.
+
+<a id="topic27__in201681"></a>
+
+<span class="tablecap">Table 1. Functions in HAWQ</span>
+
+
+| Function Type | HAWQ Support       | Description                                                                                                               | Comments                                                                                                                                               |
+|---------------|--------------------|---------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
+| IMMUTABLE     | Yes                | Relies only on information directly in its argument list. Given the same argument values, always returns the same result. |                                                                                                                                                        |
+| STABLE        | Yes, in most cases | Within a single table scan, returns the same result for same argument values, but results change across SQL statements.   | Results depend on database lookups or parameter values. `current_timestamp` family of functions is `STABLE`; values do not change within an execution. |
+| VOLATILE      | Restricted         | Function values can change within a single table scan. For example: `random()`, `currval()`, `timeofday()`.               | Any function with side effects is volatile, even if its result is predictable. For example: `setval()`.                                                |
+
+HAWQ does not support functions that return a table reference (`rangeFuncs`) or functions that use the `refCursor` datatype.
+
+## <a id="topic28"></a>User-Defined Functions
+
+HAWQ supports user-defined functions. See [Extending SQL](http://www.postgresql.org/docs/8.2/static/extend.html) in the PostgreSQL documentation for more information.
+
+In HAWQ, the shared library files for user-created functions must reside in the same library path location on every host in the HAWQ array (masters and segments).
+
+**Important:**
+HAWQ does not support the following:
+
+-   Enhanced table functions
+-   PL/Java Type Maps
+
+
+Use the `CREATE FUNCTION` statement to register user-defined functions that are used as described in [Using Functions in HAWQ](#topic27). By default, user-defined functions are declared as `VOLATILE`, so if your user-defined function is `IMMUTABLE` or `STABLE`, you must specify the correct volatility level when you register your function.
+
+### <a id="functionvolatility"></a>Function Volatility
+
+Every function has a **volatility** classification, with the possibilities being VOLATILE, STABLE, or IMMUTABLE. VOLATILE is the default if the [CREATE FUNCTION](../reference/sql/CREATE-FUNCTION.html) command does not specify a category. The volatility category is a promise to the optimizer about the behavior of the function:
+
+-   A VOLATILE function can do anything, including modifying the database. It can return different results on successive calls with the same arguments. The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed.
+-   A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.
+-   An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments. For example, a query like SELECT ... WHERE x = 2 + 2 can be simplified on sight to SELECT ... WHERE x = 4, because the function underlying the integer addition operator is marked IMMUTABLE.
+
+For best optimization results, you should label your functions with the strictest volatility category that is valid for them.
+
+Any function with side-effects must be labeled VOLATILE, so that calls to it cannot be optimized away. Even a function with no side-effects needs to be labeled VOLATILE if its value can change within a single query; some examples are random(), currval(), timeofday().
+
+Another important example is that the `current_timestamp` family of functions qualify as STABLE, since their values do not change within a transaction.
+
+There is relatively little difference between STABLE and IMMUTABLE categories when considering simple interactive queries that are planned and immediately executed: it doesn't matter a lot whether a function is executed once during planning or once during query execution startup. But there is a big difference if the plan is saved and reused later. Labeling a function IMMUTABLE when it really isn't might allow it to be prematurely folded to a constant during planning, resulting in a stale value being re-used during subsequent uses of the plan. This is a hazard when using prepared statements or when using function languages that cache plans (such as PL/pgSQL).
+
+For functions written in SQL or in any of the standard procedural languages, there is a second important property determined by the volatility category, namely the visibility of any data changes that have been made by the SQL command that is calling the function. A VOLATILE function will see such changes, a STABLE or IMMUTABLE function will not. STABLE and IMMUTABLE functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.
+
+Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute all commands of a STABLE function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query.
+
+The same snapshotting behavior is used for SELECT commands within IMMUTABLE functions. It is generally unwise to select from database tables within an IMMUTABLE function at all, since the immutability will be broken if the table contents ever change. However, PostgreSQL does not enforce that you do not do that.
+
+A common error is to label a function IMMUTABLE when its results depend on a configuration parameter. For example, a function that manipulates timestamps might well have results that depend on the timezone setting. For safety, such functions should be labeled STABLE instead.
+
+When you create user defined functions, avoid using fatal errors or destructive calls. HAWQ may respond to such errors with a sudden shutdown or restart.
+
+### <a id="nestedUDFs"></a>Nested Function Query Limitations
+
+HAWQ queries employing nested user-defined functions will fail when dispatched to segment node(s). 
+
+HAWQ stores the system catalog only on the master node. User-defined functions are stored in system catalog tables. HAWQ has no built-in knowledge about how to interpret the source text of a user-defined function. Consequently, the text is not parsed by HAWQ.
+
+This behavior may be problematic in queries where a user-defined function includes a nested function(s). When a query includes a user-defined function, metadata passed to the query executor includes function invocation information.  If run on the HAWQ master node, the nested function will be recognized. If such a query is dispatched to a segment, the nested function will not be found and the query will throw an error.
+
+## <a id="userdefinedtypes"></a>User Defined Types
+
+HAWQ can be extended to support new data types. This section describes how to define new base types, which are data types defined below the level of the SQL language. Creating a new base type requires implementing functions to operate on the type in a low-level language, usually C.
+
+A user-defined type must always have input and output functions.  These functions determine how the type appears in strings (for input by the user and output to the user) and how the type is organized in memory. The input function takes a null-terminated character string as its argument and returns the internal (in memory) representation of the type. The output function takes the internal representation of the type as argument and returns a null-terminated character string. If we want to do anything more with the type than merely store it, we must provide additional functions to implement whatever operations we'd like to have for the type.
+
+You should be careful to make the input and output functions inverses of each other. If you do not, you will have severe problems when you need to dump your data into a file and then read it back in. This is a particularly common problem when floating-point numbers are involved.
+
+Optionally, a user-defined type can provide binary input and output routines. Binary I/O is normally faster but less portable than textual I/O. As with textual I/O, it is up to you to define exactly what the external binary representation is. Most of the built-in data types try to provide a machine-independent binary representation. 
+
+Once we have written the I/O functions and compiled them into a shared library, we can define the complex type in SQL. First we declare it as a shell type:
+
+``` sql
+CREATE TYPE complex;
+```
+
+This serves as a placeholder that allows us to reference the type while defining its I/O functions. Now we can define the I/O functions:
+
+``` sql
+CREATE FUNCTION complex_in(cstring)
+    RETURNS complex
+    AS 'filename'
+    LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_out(complex)
+    RETURNS cstring
+    AS 'filename'
+    LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_recv(internal)
+   RETURNS complex
+   AS 'filename'
+   LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_send(complex)
+   RETURNS bytea
+   AS 'filename'
+   LANGUAGE C IMMUTABLE STRICT;
+```
+
+Finally, we can provide the full definition of the data type:
+
+``` sql
+CREATE TYPE complex (
+   internallength = 16, 
+   input = complex_in,
+   output = complex_out,
+   receive = complex_recv,
+   send = complex_send,
+   alignment = double
+);
+```
+
+When you define a new base type, HAWQ automatically provides support for arrays of that type. For historical reasons, the array type has the same name as the base type with the underscore character (\_) prepended.
+
+Once the data type exists, we can declare additional functions to provide useful operations on the data type. Operators can then be defined atop the functions, and if needed, operator classes can be created to support indexing of the data type. 
+
+For further details, see the description of the [CREATE TYPE](../reference/sql/CREATE-TYPE.html) command.
+
+## <a id="userdefinedoperators"></a>User Defined Operators
+
+Every operator is "syntactic sugar" for a call to an underlying function that does the real work; so you must first create the underlying function before you can create the operator. However, an operator is not merely syntactic sugar, because it carries additional information that helps the query planner optimize queries that use the operator. The next section will be devoted to explaining that additional information.
+
+HAWQ supports left unary, right unary, and binary operators. Operators can be overloaded; that is, the same operator name can be used for different operators that have different numbers and types of operands. When a query is executed, the system determines the operator to call from the number and types of the provided operands.
+
+Here is an example of creating an operator for adding two complex numbers. We assume we've already created the definition of type complex. First we need a function that does the work, then we can define the operator:
+
+``` sql
+CREATE FUNCTION complex_add(complex, complex)
+    RETURNS complex
+    AS 'filename', 'complex_add'
+    LANGUAGE C IMMUTABLE STRICT;
+
+CREATE OPERATOR + (
+    leftarg = complex,
+    rightarg = complex,
+    procedure = complex_add,
+    commutator = +
+);
+```
+
+Now we could execute a query like this:
+
+``` sql
+SELECT (a + b) AS c FROM test_complex;
+```
+
+```
+        c
+-----------------
+ (5.2,6.05)
+ (133.42,144.95)
+```
+
+We've shown how to create a binary operator here. To create unary operators, just omit one of leftarg (for left unary) or rightarg (for right unary). The procedure clause and the argument clauses are the only required items in CREATE OPERATOR. The commutator clause shown in the example is an optional hint to the query optimizer. Further details aboutcommutator and other optimizer hints appear in the next section.
+
+## <a id="topic29"></a>Built-in Functions and Operators
+
+The following table lists the categories of built-in functions and operators supported by PostgreSQL. All functions and operators are supported in HAWQ as in PostgreSQL with the exception of `STABLE` and `VOLATILE` functions, which are subject to the restrictions noted in [Using Functions in HAWQ](#topic27). See the [Functions and Operators](http://www.postgresql.org/docs/8.2/static/functions.html) section of the PostgreSQL documentation for more information about these built-in functions and operators.
+
+<a id="topic29__in204913"></a>
+
+<table>
+<caption><span class="tablecap">Table 2. Built-in functions and operators</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Operator/Function Category</th>
+<th>VOLATILE Functions</th>
+<th>STABLE Functions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions.html#FUNCTIONS-LOGICAL">Logical Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-comparison.html">Comparison Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-math.html">Mathematical Functions and Operators</a></td>
+<td>random
+<p>setseed</p></td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-string.html">String Functions and Operators</a></td>
+<td><em>All built-in conversion functions</em></td>
+<td>convert
+<p>pg_client_encoding</p></td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-binarystring.html">Binary String Functions and Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-bitstring.html">Bit String Functions and Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.3/static/functions-matching.html">Pattern Matching</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-formatting.html">Data Type Formatting Functions</a></td>
+<td> </td>
+<td>to_char
+<p>to_timestamp</p></td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-datetime.html">Date/Time Functions and Operators</a></td>
+<td>timeofday</td>
+<td>age
+<p>current_date</p>
+<p>current_time</p>
+<p>current_timestamp</p>
+<p>localtime</p>
+<p>localtimestamp</p>
+<p>now</p></td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-geometry.html">Geometric Functions and Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-net.html">Network Address Functions and Operators</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-sequence.html">Sequence Manipulation Functions</a></td>
+<td>currval
+<p>lastval</p>
+<p>nextval</p>
+<p>setval</p></td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-conditional.html">Conditional Expressions</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-array.html">Array Functions and Operators</a></td>
+<td> </td>
+<td><em>All array functions</em></td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-aggregate.html">Aggregate Functions</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-subquery.html">Subquery Expressions</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-comparisons.html">Row and Array Comparisons</a></td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-srf.html">Set Returning Functions</a></td>
+<td>generate_series</td>
+<td> </td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-info.html">System Information Functions</a></td>
+<td> </td>
+<td><em>All session information functions</em>
+<p><em>All access privilege inquiry functions</em></p>
+<p><em>All schema visibility inquiry functions</em></p>
+<p><em>All system catalog information functions</em></p>
+<p><em>All comment information functions</em></p></td>
+</tr>
+<tr class="even">
+<td><a href="http://www.postgresql.org/docs/8.2/static/functions-admin.html">System Administration Functions</a></td>
+<td>set_config
+<p>pg_cancel_backend</p>
+<p>pg_reload_conf</p>
+<p>pg_rotate_logfile</p>
+<p>pg_start_backup</p>
+<p>pg_stop_backup</p>
+<p>pg_size_pretty</p>
+<p>pg_ls_dir</p>
+<p>pg_read_file</p>
+<p>pg_stat_file</p></td>
+<td>current_setting
+<p><em>All database object size functions</em></p></td>
+</tr>
+<tr class="odd">
+<td><a href="http://www.postgresql.org/docs/9.1/interactive/functions-xml.html">XML Functions</a></td>
+<td> </td>
+<td>xmlagg(xml)
+<p>xmlexists(text, xml)</p>
+<p>xml_is_well_formed(text)</p>
+<p>xml_is_well_formed_document(text)</p>
+<p>xml_is_well_formed_content(text)</p>
+<p>xpath(text, xml)</p>
+<p>xpath(text, xml, text[])</p>
+<p>xpath_exists(text, xml)</p>
+<p>xpath_exists(text, xml, text[])</p>
+<p>xml(text)</p>
+<p>text(xml)</p>
+<p>xmlcomment(xml)</p>
+<p>xmlconcat2(xml, xml)</p></td>
+</tr>
+</tbody>
+</table>
+
+## <a id="topic30"></a>Window Functions
+
+The following built-in window functions are HAWQ extensions to the PostgreSQL database. All window functions are *immutable*. For more information about window functions, see [Window Expressions](defining-queries.html#topic13).
+
+<a id="topic30__in164369"></a>
+
+<span class="tablecap">Table 3. Window functions</span>
+
+| Function                                             | Return Type               | Full Syntax                                                                                               | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|------------------------------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `cume_dist()`                                        | `double precision`        | `CUME_DIST() OVER ( [PARTITION BY ` *expr* `] ORDER BY ` *expr* ` )`                                      | Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.                                                                                                                                                                                                                                                                                                  |
+| `dense_rank()`                                       | `bigint`                  | `DENSE_RANK () OVER ( [PARTITION BY ` *expr* `] ORDER BY ` *expr* `)`                                     | Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.                                                                                                                                                                                                                                                                                                                 |
+| `first_value(expr)`                                  | same as input *expr* type | FIRST\_VALUE expr ) OVER ( \[PARTITION BY expr \] ORDER BY expr \[ROWS|RANGE frame\_expr \] )             | Returns the first value in an ordered set of values.                                                                                                                                                                                                                                                                                                                                                                                                       |
+| `lag(expr [,offset] [,default])`                     | same as input *expr* type | `LAG(` *expr* ` [,` *offset* `] [,` *default* `]) OVER ( [PARTITION BY ` *expr* `] ORDER BY ` *expr* ` )` | Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, `LAG` provides access to a row at a given physical offset prior to that position. The default `offset` is 1. *default* sets the value that is returned if the offset goes beyond the scope of the window. If *default* is not specified, the default value is null.                           |
+| `last_valueexpr`                                     | same as input *expr* type | LAST\_VALUE(expr) OVER ( \[PARTITION BY expr\] ORDER BY expr \[ROWS|RANGE frame\_expr\] )                 | Returns the last value in an ordered set of values.                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `                   lead(expr [,offset] [,default])` | same as input *expr* type | `LEAD(expr [,offset] [,exprdefault]) OVER (                   [PARTITION BY expr] ORDER BY expr )`        | Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, `lead` provides access to a row at a given physical offset after that position. If *offset* is not specified, the default offset is 1. *default* sets the value that is returned if the offset goes beyond the scope of the window. If *default* is not specified, the default value is null. |
+| `ntile(expr)`                                        | bigint                    | `NTILE(expr) OVER ( [PARTITION BY expr] ORDER BY expr                   )`                                | Divides an ordered data set into a number of buckets (as defined by *expr*) and assigns a bucket number to each row.                                                                                                                                                                                                                                                                                                                                       |
+| `percent_rank(`)                                     | `double precision`        | `PERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY expr                   )`                            | Calculates the rank of a hypothetical row `R` minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).                                                                                                                                                                                                                                                                                                              |
+| `rank()`                                             | bigint                    | `RANK () OVER ( [PARTITION BY expr] ORDER BY expr )`                                                      | Calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. Ranks may not be consecutive numbers in this case.                                                                                                                                                                               |
+| `row_number(`)                                       | `bigint`                  | `ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr                   )`                              | Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).                                                                                                                                                                                                                                                                                                                               |
+
+
+## <a id="topic31"></a>Advanced Aggregate Functions
+
+The following built-in advanced aggregate functions are HAWQ extensions of the PostgreSQL database.
+
+<a id="topic31__in2073121"></a>
+
+<table>
+
+<caption><span class="tablecap">Table 4. Advanced Aggregate Functions</span></caption>
+<colgroup>
+<col width="25%" />
+<col width="25%" />
+<col width="25%" />
+<col width="25%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Function</th>
+<th>Return Type</th>
+<th>Full Syntax</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><code class="ph codeph">MEDIAN (expr)</code></td>
+<td><code class="ph codeph">timestamp, timestampz, interval, float</code></td>
+<td><code class="ph codeph">MEDIAN (expression)</code>
+<p><em>Example:</em></p>
+<pre class="pre codeblock"><code>SELECT department_id, MEDIAN(salary) 
+FROM employees 
+GROUP BY department_id; </code></pre></td>
+<td>Can take a two-dimensional array as input. Treats such arrays as matrices.</td>
+</tr>
+<tr class="even">
+<td><code class="ph codeph">PERCENTILE_CONT (expr) WITHIN GROUP (ORDER BY expr                   [DESC/ASC])</code></td>
+<td><code class="ph codeph">timestamp, timestampz, interval, float</code></td>
+<td><code class="ph codeph">PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY                   expression)</code>
+<p><em>Example:</em></p>
+<pre class="pre codeblock"><code>SELECT department_id,
+PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY salary DESC)
+&quot;Median_cont&quot;; 
+FROM employees GROUP BY department_id;</code></pre></td>
+<td>Performs an inverse function that assumes a continuous distribution model. It takes a percentile value and a sort specification and returns the same datatype as the numeric datatype of the argument. This returned value is a computed result after performing linear interpolation. Null are ignored in this calculation.</td>
+</tr>
+<tr class="odd">
+<td><code class="ph codeph">PERCENTILE_DISC (expr) WITHIN GROUP (ORDER BY                     expr [DESC/ASC]</code>)</td>
+<td><code class="ph codeph">timestamp, timestampz, interval, float</code></td>
+<td><code class="ph codeph">PERCENTILE_DISC(percentage) WITHIN GROUP (ORDER BY                   expression)</code>
+<p><em>Example:</em></p>
+<pre class="pre codeblock"><code>SELECT department_id, 
+PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY salary DESC)
+&quot;Median_desc&quot;; 
+FROM employees GROUP BY department_id;</code></pre></td>
+<td>Performs an inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification. This returned value is an element from the set. Null are ignored in this calculation.</td>
+</tr>
+<tr class="even">
+<td><code class="ph codeph">sum(array[])</code></td>
+<td><code class="ph codeph">smallint[]int[], bigint[], float[]</code></td>
+<td><code class="ph codeph">sum(array[[1,2],[3,4]])</code>
+<p><em>Example:</em></p>
+<pre class="pre codeblock"><code>CREATE TABLE mymatrix (myvalue int[]);
+INSERT INTO mymatrix VALUES (array[[1,2],[3,4]]);
+INSERT INTO mymatrix VALUES (array[[0,1],[1,0]]);
+SELECT sum(myvalue) FROM mymatrix;
+ sum 
+---------------
+ {{1,3},{4,4}}</code></pre></td>
+<td>Performs matrix summation. Can take as input a two-dimensional array that is treated as a matrix.</td>
+</tr>
+<tr class="odd">
+<td><code class="ph codeph">pivot_sum (label[], label, expr)</code></td>
+<td><code class="ph codeph">int[], bigint[], float[]</code></td>
+<td><code class="ph codeph">pivot_sum( array['A1','A2'], attr, value)</code></td>
+<td>A pivot aggregation using sum to resolve duplicate entries.</td>
+</tr>
+</tbody>
+</table>
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-changed.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-changed.html.md.erb b/markdown/query/gporca/query-gporca-changed.html.md.erb
new file mode 100644
index 0000000..041aa4b
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-changed.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Changed Behavior with GPORCA
+---
+
+<span class="shortdesc">When GPORCA is enabled, HAWQ's behavior changes. This topic describes these changes.</span>
+
+-   The command `CREATE TABLE AS` distributes table data randomly if the `DISTRIBUTED BY` clause is not specified and no primary or unique keys are specified.
+-   Statistics are required on the root table of a partitioned table. The `ANALYZE` command generates statistics on both root and individual partition tables (leaf child tables). See the `ROOTPARTITION` clause for `ANALYZE` command.
+-   Additional Result nodes in the query plan:
+    -   Query plan `Assert` operator.
+    -   Query plan `Partition selector` operator.
+    -   Query plan `Split` operator.
+-   When running `EXPLAIN`, the query plan generated by GPORCA is different than the plan generated by the legacy query optimizer.
+-   HAWQ adds the log file message `Planner produced plan` when GPORCA is enabled and HAWQ falls back to the legacy query optimizer to generate the query plan.
+-   HAWQ issues a warning when statistics are missing from one or more table columns. When executing an SQL command with GPORCA, HAWQ issues a warning if the command performance could be improved by collecting statistics on a column or set of columns referenced by the command. The warning is issued on the command line and information is added to the HAWQ log file. For information about collecting statistics on table columns, see the `ANALYZE` command.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-enable.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-enable.html.md.erb b/markdown/query/gporca/query-gporca-enable.html.md.erb
new file mode 100644
index 0000000..e8cc93f
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-enable.html.md.erb
@@ -0,0 +1,95 @@
+---
+title: Enabling GPORCA
+---
+
+<span class="shortdesc">Precompiled versions of HAWQ that include the GPORCA query optimizer enable it by default, no additional configuration is required. To use the GPORCA query optimizer in a HAWQ built from source, your build must include GPORCA. You must also enable specific HAWQ server configuration parameters at or after install time: </span>
+
+-   [Set the <code class="ph codeph">optimizer\_analyze\_root\_partition</code> parameter to <code class="ph codeph">on</code>](#topic_r5d_hv1_kr) to enable statistics collection for the root partition of a partitioned table.
+-   Set the `optimizer` parameter to `on` to enable GPORCA. You can set the parameter at these levels:
+    -   [A HAWQ system](#topic_byp_lqk_br)
+    -   [A specific HAWQ database](#topic_pzr_3db_3r)
+    -   [A session or query](#topic_lx4_vqk_br)
+
+**Important:** If you intend to execute queries on partitioned tables with GPORCA enabled, you must collect statistics on the partitioned table root partition with the `ANALYZE ROOTPARTITION` command. The command `ANALYZE         ROOTPARTITION` collects statistics on the root partition of a partitioned table without collecting statistics on the leaf partitions. If you specify a list of column names for a partitioned table, the statistics for the columns and the root partition are collected. For information on the `ANALYZE` command, see [ANALYZE](../../reference/sql/ANALYZE.html).
+
+You can also use the HAWQ utility `analyzedb` to update table statistics. The HAWQ utility `analyzedb` can update statistics for multiple tables in parallel. The utility can also check table statistics and update statistics only if the statistics are not current or do not exist. For information about the `analyzedb` utility, see [analyzedb](../../reference/cli/admin_utilities/analyzedb.html#topic1).
+
+As part of routine database maintenance, you should refresh statistics on the root partition when there are significant changes to child leaf partition data.
+
+## <a id="topic_r5d_hv1_kr"></a>Setting the optimizer\_analyze\_root\_partition Parameter
+
+When the configuration parameter `optimizer_analyze_root_partition` is set to `on`, root partition statistics will be collected when `ANALYZE` is run on a partitioned table. Root partition statistics are required by GPORCA.
+
+You will perform different procedures to set optimizer configuration parameters for your whole HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set optimizer server configuration parameters.
+
+If you use Ambari to manage your HAWQ cluster:
+
+1. Set the `optimizer_analyze_root_partition` configuration property to `on` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. 
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your HAWQ cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to set `optimizer_analyze_root_partition`:
+
+    ``` shell
+    $ hawq config -c optimizer_analyze_root_partition -v on
+    ```
+2. Reload the HAWQ configuration:
+
+    ``` shell
+    $ hawq stop cluster -u
+    ```
+
+## <a id="topic_byp_lqk_br"></a>Enabling GPORCA for a System
+
+Set the server configuration parameter `optimizer` for the HAWQ system.
+
+If you use Ambari to manage your HAWQ cluster:
+
+1. Set the `optimizer` configuration property to `on` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. 
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your HAWQ cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to set `optimizer`:
+
+    ``` shell
+    $ hawq config -c optimizer -v on
+    ```
+2. Reload the HAWQ configuration:
+
+    ``` shell
+    $ hawq stop cluster -u
+    ```
+
+## <a id="topic_pzr_3db_3r"></a>Enabling GPORCA for a Database
+
+Set the server configuration parameter `optimizer` for individual HAWQ databases with the `ALTER DATABASE` command. For example, this command enables GPORCA for the database *test\_db*.
+
+``` sql
+=> ALTER DATABASE test_db SET optimizer = ON ;
+```
+
+## <a id="topic_lx4_vqk_br"></a>Enabling GPORCA for a Session or a Query
+
+You can use the `SET` command to set `optimizer` server configuration parameter for a session. For example, after you use the `psql` utility to connect to HAWQ, this `SET` command enables GPORCA:
+
+``` sql
+=> SET optimizer = on ;
+```
+
+To set the parameter for a specific query, include the `SET` command prior to running the query.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-fallback.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-fallback.html.md.erb b/markdown/query/gporca/query-gporca-fallback.html.md.erb
new file mode 100644
index 0000000..999e9a7
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-fallback.html.md.erb
@@ -0,0 +1,142 @@
+---
+title: Determining The Query Optimizer In Use
+---
+
+<span class="shortdesc"> When GPORCA is enabled, you can determine if HAWQ is using GPORCA or is falling back to the legacy query optimizer. </span>
+
+These are two ways to determine which query optimizer HAWQ used to execute the query:
+
+-   Examine `EXPLAIN` query plan output for the query. (Your output may include other settings.)
+    -   When GPORCA generates the query plan, the GPORCA version is displayed near the end of the query plan . For example.
+
+        ``` pre
+         Settings:  optimizer=on
+         Optimizer status:  PQO version 1.627
+        ```
+
+        When HAWQ falls back to the legacy optimizer to generate the plan, `legacy query                 optimizer` is displayed near the end of the query plan. For example.
+
+        ``` pre
+         Settings:  optimizer=on
+         Optimizer status: legacy query optimizer
+        ```
+
+        When the server configuration parameter `OPTIMIZER` is `off`, the following lines are displayed near the end of a query plan.
+
+        ``` pre
+         Settings:  optimizer=off
+         Optimizer status: legacy query optimizer
+        ```
+
+    -   These plan items appear only in the `EXPLAIN` plan output generated by GPORCA. The items are not supported in a legacy optimizer query plan.
+        -   Assert operator
+        -   Sequence operator
+        -   DynamicIndexScan
+        -   DynamicTableScan
+        -   Table Scan
+    -   When a query against a partitioned table is generated by GPORCA, the `EXPLAIN` plan displays only the number of partitions that are being eliminated is listed. The scanned partitions are not shown. The `EXPLAIN` plan generated by the legacy optimizer lists the scanned partitions.
+
+-   View the log messages in the HAWQ log file.
+
+    The log file contains messages that indicate which query optimizer was used. In the log file message, the `[OPT]` flag appears when GPORCA attempts to optimize a query. If HAWQ falls back to the legacy optimizer, an error message is added to the log file, indicating the unsupported feature. Also, in the message, the label `Planner produced             plan:` appears before the query when HAWQ falls back to the legacy optimizer.
+
+    **Note:** You can configure HAWQ to display log messages on the psql command line by setting the HAWQ server configuration parameter `client_min_messages` to `LOG`. See [Server Configuration Parameter Reference](../../reference/HAWQSiteConfig.html) for information about the parameter.
+
+## <a id="topic_n4w_nb5_xr"></a>Example
+
+This example shows the differences for a query that is run against partitioned tables when GPORCA is enabled.
+
+This `CREATE TABLE` statement creates a table with single level partitions:
+
+``` sql
+CREATE TABLE sales (trans_id int, date date, 
+    amount decimal(9,2), region text)
+   DISTRIBUTED BY (trans_id)
+   PARTITION BY RANGE (date)
+      (START (date '2011­01­01') 
+       INCLUSIVE END (date '2012­01­01') 
+       EXCLUSIVE EVERY (INTERVAL '1 month'),
+   DEFAULT PARTITION outlying_dates);
+```
+
+This query against the table is supported by GPORCA and does not generate errors in the log file:
+
+``` sql
+SELECT * FROM sales;
+```
+
+The `EXPLAIN` plan output lists only the number of selected partitions.
+
+``` 
+ ->  Partition Selector for sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=50 width=4)
+       Partitions selected:  13 (out of 13)
+```
+
+Output from the log file indicates that GPORCA attempted to optimize the query:
+
+``` 
+2015-05-06 15:00:53.293451 PDT,"gpadmin","test",p2809,th297883424,"[local]",
+  ,2015-05-06 14:59:21 PDT,1120,con6,cmd1,seg-1,,dx3,x1120,sx1,"LOG","00000"
+  ,"statement: explain select * from sales
+;",,,,,,"explain select * from sales
+;",0,,"postgres.c",1566,
+
+2015-05-06 15:00:54.258412 PDT,"gpadmin","test",p2809,th297883424,"[local]",
+  ,2015-05-06 14:59:21 PDT,1120,con6,cmd1,seg-1,,dx3,x1120,sx1,"LOG","00000","
+[OPT]: Using default search strategy",,,,,,"explain select * from sales
+;",0,,"COptTasks.cpp",677,
+```
+
+The following cube query is not supported by GPORCA.
+
+``` sql
+SELECT count(*) FROM foo GROUP BY cube(a,b);
+```
+
+The following EXPLAIN plan output includes the message "Feature not supported by GPORCA."
+
+``` sql
+postgres=# EXPLAIN SELECT count(*) FROM foo GROUP BY cube(a,b);
+```
+```
+LOG:  statement: explain select count(*) from foo group by cube(a,b);
+LOG:  2016-04-14 16:26:15:487935 PDT,THD000,NOTICE,"Feature not supported by the GPORCA: Cube",
+LOG:  Planner produced plan :0
+                                                        QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------
+ Gather Motion 3:1  (slice3; segments: 3)  (cost=9643.62..19400.26 rows=40897 width=28)
+   ->  Append  (cost=9643.62..19400.26 rows=13633 width=28)
+         ->  HashAggregate  (cost=9643.62..9993.39 rows=9328 width=28)
+               Group By: "rollup".unnamed_attr_2, "rollup".unnamed_attr_1, "rollup"."grouping", "rollup"."group_id"
+               ->  Subquery Scan "rollup"  (cost=8018.50..9589.81 rows=1435 width=28)
+                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=8018.50..9546.76 rows=1435 width=28)
+                           Hash Key: "rollup".unnamed_attr_2, "rollup".unnamed_attr_1, "grouping", group_id()
+                           ->  GroupAggregate  (cost=8018.50..9460.66 rows=1435 width=28)
+                                 Group By: "rollup"."grouping", "rollup"."group_id"
+                                 ->  Subquery Scan "rollup"  (cost=8018.50..9326.13 rows=2153 width=28)
+                                       ->  GroupAggregate  (cost=8018.50..9261.56 rows=2153 width=28)
+                                             Group By: "rollup".unnamed_attr_2, "rollup"."grouping", "rollup"."group_id"
+                                             ->  Subquery Scan "rollup"  (cost=8018.50..9073.22 rows=2870 width=28)
+                                                   ->  GroupAggregate  (cost=8018.50..8987.12 rows=2870 width=28)
+                                                         Group By: public.foo.b, public.foo.a
+                                                         ->  Sort  (cost=8018.50..8233.75 rows=28700 width=8)
+                                                               Sort Key: public.foo.b, public.foo.a
+                                                               ->  Seq Scan on foo  (cost=0.00..961.00 rows=28700 width=8)
+         ->  HashAggregate  (cost=9116.27..9277.71 rows=4305 width=28)
+               Group By: "rollup".unnamed_attr_1, "rollup".unnamed_attr_2, "rollup"."grouping", "rollup"."group_id"
+               ->  Subquery Scan "rollup"  (cost=8018.50..9062.46 rows=1435 width=28)
+                     ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=8018.50..9019.41 rows=1435 width=28)
+                           Hash Key: public.foo.a, public.foo.b, "grouping", group_id()
+                           ->  GroupAggregate  (cost=8018.50..8933.31 rows=1435 width=28)
+                                 Group By: public.foo.a
+                                 ->  Sort  (cost=8018.50..8233.75 rows=28700 width=8)
+                                       Sort Key: public.foo.a
+                                       ->  Seq Scan on foo  (cost=0.00..961.00 rows=28700 width=8)
+ Settings:  optimizer=on
+ Optimizer status: legacy query optimizer
+(30 rows)
+```
+
+Since this query is not supported by GPORCA, HAWQ falls back to the legacy optimizer.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-features.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-features.html.md.erb b/markdown/query/gporca/query-gporca-features.html.md.erb
new file mode 100644
index 0000000..4941866
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-features.html.md.erb
@@ -0,0 +1,215 @@
+---
+title: GPORCA Features and Enhancements
+---
+
+GPORCA includes enhancements for specific types of queries and operations.  GPORCA also includes these optimization enhancements:
+
+-   Improved join ordering
+-   Join-Aggregate reordering
+-   Sort order optimization
+-   Data skew estimates included in query optimization
+
+## <a id="topic_dwy_zml_gr"></a>Queries Against Partitioned Tables
+
+GPORCA includes these enhancements for queries against partitioned tables:
+
+-   Partition elimination is improved.
+-   Query plan can contain the `Partition selector` operator.
+-   Partitions are not enumerated in `EXPLAIN` plans.
+
+    For queries that involve static partition selection where the partitioning key is compared to a constant, GPORCA lists the number of partitions to be scanned in the `EXPLAIN` output under the Partition Selector operator. This example Partition Selector operator shows the filter and number of partitions selected:
+
+    ``` pre
+    Partition Selector for Part_Table (dynamic scan id: 1) 
+           Filter: a > 10
+           Partitions selected:  1 (out of 3)
+    ```
+
+    For queries that involve dynamic partition selection where the partitioning key is compared to a variable, the number of partitions that are scanned will be known only during query execution. The partitions selected are not shown in the `EXPLAIN` output.
+
+-   Plan size is independent of number of partitions.
+-   Out of memory errors caused by number of partitions are reduced.
+
+This example `CREATE TABLE` command creates a range partitioned table.
+
+``` sql
+CREATE TABLE sales(order_id int, item_id int, amount numeric(15,2), 
+      date date, yr_qtr int)
+   RANGE PARTITIONED BY yr_qtr;
+```
+
+GPORCA improves on these types of queries against partitioned tables:
+
+-   Full table scan. Partitions are not enumerated in plans.
+
+    ``` sql
+    SELECT * FROM sales;
+    ```
+
+-   Query with a constant filter predicate. Partition elimination is performed.
+
+    ``` sql
+    SELECT * FROM sales WHERE yr_qtr = 201201;
+    ```
+
+-   Range selection. Partition elimination is performed.
+
+    ``` sql
+    SELECT * FROM sales WHERE yr_qtr BETWEEN 201301 AND 201404 ;
+    ```
+
+-   Joins involving partitioned tables. In this example, the partitioned dimension table *date\_dim* is joined with fact table *catalog\_sales*:
+
+    ``` sql
+    SELECT * FROM catalog_sales
+       WHERE date_id IN (SELECT id FROM date_dim WHERE month=12);
+    ```
+
+## <a id="topic_vph_wml_gr"></a>Queries that Contain Subqueries
+
+GPORCA handles subqueries more efficiently. A subquery is query that is nested inside an outer query block. In the following query, the `SELECT` in the `WHERE` clause is a subquery.
+
+``` sql
+SELECT * FROM part
+  WHERE price > (SELECT avg(price) FROM part);
+```
+
+GPORCA also handles queries that contain a correlated subquery (CSQ) more efficiently. A correlated subquery is a subquery that uses values from the outer query. In the following query, the `price` column is used in both the outer query and the subquery.
+
+``` sql
+SELECT * FROM part p1
+  WHERE price > (SELECT avg(price) FROM part p2 
+  WHERE  p2.brand = p1.brand);
+```
+
+GPORCA generates more efficient plans for the following types of subqueries:
+
+-   CSQ in the `SELECT` list.
+
+    ``` sql
+    SELECT *,
+     (SELECT min(price) FROM part p2 WHERE p1.brand = p2.brand)
+     AS foo
+    FROM part p1;
+    ```
+
+-   CSQ in disjunctive (`OR`) filters.
+
+    ``` sql
+    SELECT FROM part p1 WHERE p_size > 40 OR 
+          p_retailprice > 
+          (SELECT avg(p_retailprice) 
+              FROM part p2 
+              WHERE p2.p_brand = p1.p_brand)
+    ```
+
+-   Nested CSQ with skip level correlations
+
+    ``` sql
+    SELECT * FROM part p1 WHERE p1.p_partkey 
+    IN (SELECT p_partkey FROM part p2 WHERE p2.p_retailprice = 
+         (SELECT min(p_retailprice)
+           FROM part p3 
+           WHERE p3.p_brand = p1.p_brand)
+    );
+    ```
+
+    **Note:** Nested CSQ with skip level correlations are not supported by the legacy query optimizer.
+
+-   CSQ with aggregate and inequality. This example contains a CSQ with an inequality.
+
+    ``` sql
+    SELECT * FROM part p1 WHERE p1.p_retailprice =
+     (SELECT min(p_retailprice) FROM part p2 WHERE p2.p_brand <> p1.p_brand);
+    ```
+
+<!-- -->
+
+-   CSQ that must return one row.
+
+    ``` sql
+    SELECT p_partkey, 
+      (SELECT p_retailprice FROM part p2 WHERE p2.p_brand = p1.p_brand )
+    FROM part p1;
+    ```
+
+## <a id="topic_c3v_rml_gr"></a>Queries that Contain Common Table Expressions
+
+GPORCA handles queries that contain the `WITH` clause. The `WITH` clause, also known as a common table expression (CTE), generates temporary tables that exist only for the query. This example query contains a CTE.
+
+``` sql
+WITH v AS (SELECT a, sum(b) as s FROM T WHERE c < 10 GROUP BY a)
+  SELECT *FROM  v AS v1 ,  v AS v2
+  WHERE v1.a <> v2.a AND v1.s < v2.s;
+```
+
+As part of query optimization, GPORCA can push down predicates into a CTE. For example query, GPORCA pushes the equality predicates to the CTE.
+
+``` sql
+WITH v AS (SELECT a, sum(b) as s FROM T GROUP BY a)
+  SELECT *
+  FROM v as v1, v as v2, v as v3
+  WHERE v1.a < v2.a
+    AND v1.s < v3.s
+    AND v1.a = 10
+    AND v2.a = 20
+    AND v3.a = 30;
+```
+
+GPORCA can handle these types of CTEs:
+
+-   CTE that defines one or multiple tables. In this query, the CTE defines two tables.
+
+    ``` sql
+    WITH cte1 AS (SELECT a, sum(b) as s FROM T 
+                   where c < 10 GROUP BY a),
+          cte2 AS (SELECT a, s FROM cte1 where s > 1000)
+      SELECT *
+      FROM cte1 as v1, cte2 as v2, cte2 as v3
+      WHERE v1.a < v2.a AND v1.s < v3.s;
+    ```
+
+-   Nested CTEs.
+
+    ``` sql
+    WITH v AS (WITH w AS (SELECT a, b FROM foo 
+                          WHERE b < 5) 
+               SELECT w1.a, w2.b 
+               FROM w AS w1, w AS w2 
+               WHERE w1.a = w2.a AND w1.a > 2)
+      SELECT v1.a, v2.a, v2.b
+      FROM v as v1, v as v2
+      WHERE v1.a < v2.a; 
+    ```
+
+## <a id="topic_plx_mml_gr"></a>DML Operation Enhancements with GPORCA
+
+GPORCA contains enhancements for DML operations such as `INSERT`.
+
+-   A DML node in a query plan is a query plan operator.
+    -   Can appear anywhere in the plan, as a regular node (top slice only for now)
+    -   Can have consumers
+-   New query plan operator `Assert` is used for constraints checking.
+
+    This example plan shows the `Assert` operator.
+
+    ```
+    QUERY PLAN
+    ------------------------------------------------------------
+     Insert  (cost=0.00..4.61 rows=3 width=8)
+       ->  Assert  (cost=0.00..3.37 rows=3 width=24)
+             Assert Cond: (dmlsource.a > 2) IS DISTINCT FROM 
+    false
+             ->  Assert  (cost=0.00..2.25 rows=3 width=24)
+                   Assert Cond: NOT dmlsource.b IS NULL
+                   ->  Result  (cost=0.00..1.14 rows=3 width=24)
+                         ->  Table Scan on dmlsource
+    ```
+
+## <a id="topic_anl_t3t_pv"></a>Queries with Distinct Qualified Aggregates (DQA)
+
+GPORCA improves performance for queries that contain distinct qualified aggregates (DQA) without a grouping column and when the table is not distributed on the columns used by the DQA. When encountering these types of queries, GPORCA uses an alternative plan that evaluates the aggregate functions in three stages (local, intermediate, and global aggregations).
+
+See [optimizer\_prefer\_scalar\_dqa\_multistage\_agg](../../reference/guc/parameter_definitions.html#optimizer_prefer_scalar_dqa_multistage_agg) for information on the configuration parameter that controls this behavior.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-limitations.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-limitations.html.md.erb b/markdown/query/gporca/query-gporca-limitations.html.md.erb
new file mode 100644
index 0000000..b63f0d2
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-limitations.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: GPORCA Limitations
+---
+
+<span class="shortdesc">There are limitations in HAWQ when GPORCA is enabled. GPORCA and the legacy query optimizer currently coexist in HAWQ because GPORCA does not support all HAWQ features. </span>
+
+
+## <a id="topic_kgn_vxl_vp"></a>Unsupported SQL Query Features
+
+These HAWQ features are unsupported when GPORCA is enabled:
+
+-   Indexed expressions
+-   `PERCENTILE` window function
+-   External parameters
+-   SortMergeJoin (SMJ)
+-   Ordered aggregations
+-   These analytics extensions:
+    -   CUBE
+    -   Multiple grouping sets
+-   These scalar operators:
+    -   `ROW`
+    -   `ROWCOMPARE`
+    -   `FIELDSELECT`
+-   Multiple `DISTINCT` qualified aggregate functions
+-   Inverse distribution functions
+
+## <a id="topic_u4t_vxl_vp"></a>Performance Regressions
+
+When GPORCA is enabled in HAWQ, the following features are known performance regressions:
+
+-   Short running queries - For GPORCA, short running queries might encounter additional overhead due to GPORCA enhancements for determining an optimal query execution plan.
+-   `ANALYZE` - For GPORCA, the `ANALYZE` command generates root partition statistics for partitioned tables. For the legacy optimizer, these statistics are not generated.
+-   DML operations - For GPORCA, DML enhancements including the support of updates on partition and distribution keys might require additional overhead.
+
+Also, enhanced functionality of the features from previous versions could result in additional time required when GPORCA executes SQL statements with the features.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-notes.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-notes.html.md.erb b/markdown/query/gporca/query-gporca-notes.html.md.erb
new file mode 100644
index 0000000..ed943e4
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-notes.html.md.erb
@@ -0,0 +1,28 @@
+---
+title: Considerations when Using GPORCA
+---
+
+<span class="shortdesc"> To execute queries optimally with GPORCA, consider certain criteria for the query. </span>
+
+Ensure the following criteria are met:
+
+-   The table does not contain multi-column partition keys.
+-   The table does not contain multi-level partitioning.
+-   The query does not run against master only tables such as the system table *pg\_attribute*.
+-   Statistics have been collected on the root partition of a partitioned table.
+
+If the partitioned table contains more than 20,000 partitions, consider a redesign of the table schema.
+
+GPORCA generates minidumps to describe the optimization context for a given query. Use the minidump files to analyze HAWQ issues. The minidump file is located under the master data directory and uses the following naming format:
+
+`Minidump_date_time.mdp`
+
+For information about the minidump file, see the server configuration parameter `optimizer_minidump`.
+
+When the `EXPLAIN ANALYZE` command uses GPORCA, the `EXPLAIN` plan shows only the number of partitions that are being eliminated. The scanned partitions are not shown. To show name of the scanned partitions in the segment logs set the server configuration parameter `gp_log_dynamic_partition_pruning` to `on`. This example `SET` command enables the parameter.
+
+``` sql
+SET gp_log_dynamic_partition_pruning = on;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-optimizer.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-optimizer.html.md.erb b/markdown/query/gporca/query-gporca-optimizer.html.md.erb
new file mode 100644
index 0000000..11814f8
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-optimizer.html.md.erb
@@ -0,0 +1,39 @@
+---
+title: About GPORCA
+---
+
+In HAWQ, you can use GPORCA or the legacy query optimizer.
+
+**Note:** To use the GPORCA query optimizer, you must be running a version of HAWQ built with GPORCA, and GPORCA must be enabled in your HAWQ deployment.
+
+These sections describe GPORCA functionality and usage:
+
+-   **[Overview of GPORCA](../../query/gporca/query-gporca-overview.html)**
+
+    GPORCA extends the planning and optimization capabilities of the HAWQ legacy optimizer.
+
+-   **[GPORCA Features and Enhancements](../../query/gporca/query-gporca-features.html)**
+
+    GPORCA includes enhancements for specific types of queries and operations:
+
+-   **[Enabling GPORCA](../../query/gporca/query-gporca-enable.html)**
+
+    Precompiled versions of HAWQ that include the GPORCA query optimizer enable it by default, no additional configuration is required. To use the GPORCA query optimizer in a HAWQ built from source, your build must include GPORCA. You must also enable specific HAWQ server configuration parameters at or after install time:
+
+-   **[Considerations when Using GPORCA](../../query/gporca/query-gporca-notes.html)**
+
+    To execute queries optimally with GPORCA, consider certain criteria for the query.
+
+-   **[Determining The Query Optimizer In Use](../../query/gporca/query-gporca-fallback.html)**
+
+    When GPORCA is enabled, you can determine if HAWQ is using GPORCA or is falling back to the legacy query optimizer.
+
+-   **[Changed Behavior with GPORCA](../../query/gporca/query-gporca-changed.html)**
+
+    When GPORCA is enabled, HAWQ's behavior changes. This topic describes these changes.
+
+-   **[GPORCA Limitations](../../query/gporca/query-gporca-limitations.html)**
+
+    There are limitations in HAWQ when GPORCA is enabled. GPORCA and the legacy query optimizer currently coexist in HAWQ because GPORCA does not support all HAWQ features.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/gporca/query-gporca-overview.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/gporca/query-gporca-overview.html.md.erb b/markdown/query/gporca/query-gporca-overview.html.md.erb
new file mode 100644
index 0000000..56f97eb
--- /dev/null
+++ b/markdown/query/gporca/query-gporca-overview.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: Overview of GPORCA
+---
+
+<span class="shortdesc">GPORCA extends the planning and optimization capabilities of the HAWQ legacy optimizer. </span> GPORCA is extensible and achieves better optimization in multi-core architecture environments. When GPORCA is available in your HAWQ installation and enabled, HAWQ uses GPORCA to generate an execution plan for a query when possible.
+
+GPORCA also enhances HAWQ query performance tuning in the following areas:
+
+-   Queries against partitioned tables
+-   Queries that contain a common table expression (CTE)
+-   Queries that contain subqueries
+
+The legacy and GPORCA query optimizers coexist in HAWQ. The default query optimizer is GPORCA. When GPORCA is available and enabled in your HAWQ installation, HAWQ uses GPORCA to generate an execution plan for a query when possible. If GPORCA cannot be used, the legacy query optimizer is used.
+
+The following flow chart shows how GPORCA fits into the query planning architecture:
+
+<img src="../../images/gporca.png" id="topic1__image_rf5_svc_fv" class="image" width="672" />
+
+You can inspect the log to determine whether GPORCA or the legacy query optimizer produced the plan. The log message, "Optimizer produced plan" indicates that GPORCA generated the plan for your query. If the legacy query optimizer generated the plan, the log message reads "Planner produced plan". See [Determining The Query Optimizer In Use](query-gporca-fallback.html#topic1).
+
+**Note:** All legacy query optimizer (planner) server configuration parameters are ignored by GPORCA. However, if HAWQ falls back to the legacy optimizer, the planner server configuration parameters will impact the query plan generation.
+
+



Mime
View raw message