impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbap...@apache.org
Subject [37/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.
Date Thu, 17 Nov 2016 23:12:15 GMT
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_function.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_function.xml b/docs/topics/impala_create_function.xml
new file mode 100644
index 0000000..5b81f50
--- /dev/null
+++ b/docs/topics/impala_create_function.xml
@@ -0,0 +1,492 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="create_function">
+
+  <title>CREATE FUNCTION Statement</title>
+  <titlealts audience="PDF"><navtitle>CREATE FUNCTION</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="UDFs"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE FUNCTION statement</indexterm>
+      Creates a user-defined function (UDF), which you can use to implement custom logic during
+      <codeph>SELECT</codeph> or <codeph>INSERT</codeph> operations.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      The syntax is different depending on whether you create a scalar UDF, which is called once for each row and
+      implemented by a single function, or a user-defined aggregate function (UDA), which is implemented by
+      multiple functions that compute intermediate results across sets of rows.
+    </p>
+
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      In <keyword keyref="impala25_full"/> and higher, the syntax is also different for creating or dropping scalar Java-based UDFs.
+      The statements for Java UDFs use a new syntax, without any argument types or return type specified. Java-based UDFs
+      created using the new syntax persist across restarts of the Impala catalog server, and can be shared transparently
+      between Impala and Hive.
+    </p>
+
+    <p>
+      To create a persistent scalar C++ UDF with <codeph>CREATE FUNCTION</codeph>:
+    </p>
+
+<codeblock>CREATE FUNCTION [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[, <varname>arg_type</varname>...])
+  RETURNS <varname>return_type</varname>
+  LOCATION '<varname>hdfs_path_to_dot_so</varname>'
+  SYMBOL='<varname>symbol_name</varname>'</codeblock>
+
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      To create a persistent Java UDF with <codeph>CREATE FUNCTION</codeph>:
+<codeblock>CREATE FUNCTION [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>function_name</varname>
+  LOCATION '<varname>hdfs_path_to_jar</varname>'
+  SYMBOL='<varname>class_name</varname>'</codeblock>
+    </p>
+
+<!--
+Examples:
+CREATE FUNCTION IF NOT EXISTS foo location '/path/to/jar' SYMBOL='TestUdf';
+CREATE FUNCTION bar location '/path/to/jar' SYMBOL='TestUdf2';
+DROP FUNCTION foo;
+DROP FUNCTION IF EXISTS bar;
+-->
+
+    <p>
+      To create a persistent UDA, which must be written in C++, issue a <codeph>CREATE AGGREGATE FUNCTION</codeph> statement:
+    </p>
+
+<codeblock>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[, <varname>arg_type</varname>...])
+  RETURNS <varname>return_type</varname>
+  LOCATION '<varname>hdfs_path</varname>'
+  [INIT_FN='<varname>function</varname>]
+  UPDATE_FN='<varname>function</varname>
+  MERGE_FN='<varname>function</varname>
+  [PREPARE_FN='<varname>function</varname>]
+  [CLOSEFN='<varname>function</varname>]
+  <ph rev="2.0.0">[SERIALIZE_FN='<varname>function</varname>]</ph>
+  [FINALIZE_FN='<varname>function</varname>]
+  <ph rev="2.3.0 IMPALA-1829 CDH-30572">[INTERMEDIATE <varname>type_spec</varname>]</ph></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p>
+      <b>Varargs notation:</b>
+    </p>
+
+    <note rev="CDH-39271 CDH-38572">
+      <p rev="CDH-39271 CDH-38572">
+        Variable-length argument lists are supported for C++ UDFs, but currently not for Java UDFs.
+      </p>
+    </note>
+
+    <p>
+      If the underlying implementation of your function accepts a variable number of arguments:
+    </p>
+
+    <ul>
+      <li>
+        The variable arguments must go last in the argument list.
+      </li>
+
+      <li>
+        The variable arguments must all be of the same type.
+      </li>
+
+      <li>
+        You must include at least one instance of the variable arguments in every function call invoked from SQL.
+      </li>
+
+      <li>
+        You designate the variable portion of the argument list in the <codeph>CREATE FUNCTION</codeph> statement
+        by including <codeph>...</codeph> immediately after the type name of the first variable argument. For
+        example, to create a function that accepts an <codeph>INT</codeph> argument, followed by a
+        <codeph>BOOLEAN</codeph>, followed by one or more <codeph>STRING</codeph> arguments, your <codeph>CREATE
+        FUNCTION</codeph> statement would look like:
+<codeblock>CREATE FUNCTION <varname>func_name</varname> (INT, BOOLEAN, STRING ...)
+  RETURNS <varname>type</varname> LOCATION '<varname>path</varname>' SYMBOL='<varname>entry_point</varname>';
+</codeblock>
+      </li>
+    </ul>
+
+    <p rev="CDH-39271 CDH-38572">
+      See <xref href="impala_udf.xml#udf_varargs"/> for how to code a C++ UDF to accept
+      variable-length argument lists.
+    </p>
+
+    <p>
+      <b>Scalar and aggregate functions:</b>
+    </p>
+
+    <p>
+      The simplest kind of user-defined function returns a single scalar value each time it is called, typically
+      once for each row in the result set. This general kind of function is what is usually meant by UDF.
+      User-defined aggregate functions (UDAs) are a specialized kind of UDF that produce a single value based on
+      the contents of multiple rows. You usually use UDAs in combination with a <codeph>GROUP BY</codeph> clause to
+      condense a large result set into a smaller one, or even a single row summarizing column values across an
+      entire table.
+    </p>
+
+    <p>
+      You create UDAs by using the <codeph>CREATE AGGREGATE FUNCTION</codeph> syntax. The clauses
+      <codeph>INIT_FN</codeph>, <codeph>UPDATE_FN</codeph>, <codeph>MERGE_FN</codeph>,
+      <ph rev="2.0.0"><codeph>SERIALIZE_FN</codeph>,</ph> <codeph>FINALIZE_FN</codeph>, and
+      <codeph>INTERMEDIATE</codeph> only apply when you create a UDA rather than a scalar UDF.
+    </p>
+
+    <p>
+      The <codeph>*_FN</codeph> clauses specify functions to call at different phases of function processing.
+    </p>
+
+    <ul>
+      <li>
+        <b>Initialize:</b> The function you specify with the <codeph>INIT_FN</codeph> clause does any initial
+        setup, such as initializing member variables in internal data structures. This function is often a stub for
+        simple UDAs. You can omit this clause and a default (no-op) function will be used.
+      </li>
+
+      <li>
+        <b>Update:</b> The function you specify with the <codeph>UPDATE_FN</codeph> clause is called once for each
+        row in the original result set, that is, before any <codeph>GROUP BY</codeph> clause is applied. A separate
+        instance of the function is called for each different value returned by the <codeph>GROUP BY</codeph>
+        clause. The final argument passed to this function is a pointer, to which you write an updated value based
+        on its original value and the value of the first argument.
+      </li>
+
+      <li>
+        <b>Merge:</b> The function you specify with the <codeph>MERGE_FN</codeph> clause is called an arbitrary
+        number of times, to combine intermediate values produced by different nodes or different threads as Impala
+        reads and processes data files in parallel. The final argument passed to this function is a pointer, to
+        which you write an updated value based on its original value and the value of the first argument.
+      </li>
+
+      <li rev="2.0.0">
+        <b>Serialize:</b> The function you specify with the <codeph>SERIALIZE_FN</codeph> clause frees memory
+        allocated to intermediate results. It is required if any memory was allocated by the Allocate function in
+        the Init, Update, or Merge functions, or if the intermediate type contains any pointers. See
+        <xref href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc" scope="external" format="html">the
+        UDA code samples</xref> for details.
+      </li>
+
+      <li>
+        <b>Finalize:</b> The function you specify with the <codeph>FINALIZE_FN</codeph> clause does any required
+        teardown for resources acquired by your UDF, such as freeing memory, closing file handles if you explicitly
+        opened any files, and so on. This function is often a stub for simple UDAs. You can omit this clause and a
+        default (no-op) function will be used. It is required in UDAs where the final return type is different than
+        the intermediate type. or if any memory was allocated by the Allocate function in the Init, Update, or
+        Merge functions. See
+        <xref href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc" scope="external" format="html">the
+        UDA code samples</xref> for details.
+      </li>
+    </ul>
+
+    <p>
+      If you use a consistent naming convention for each of the underlying functions, Impala can automatically
+      determine the names based on the first such clause, so the others are optional.
+    </p>
+
+    <p audience="Cloudera">
+      The <codeph>INTERMEDIATE</codeph> clause specifies the data type of intermediate values passed from the
+      <q>update</q> phase to the <q>merge</q> phase, and from the <q>merge</q> phase to the <q>finalize</q> phase.
+      You can use any of the existing Impala data types, or the special notation
+      <codeph>CHAR(<varname>n</varname>)</codeph> to allocate a scratch area of <varname>n</varname> bytes for the
+      intermediate result. For example, if the different phases of your UDA pass strings to each other but in the
+      end the function returns a <codeph>BIGINT</codeph> value, you would specify <codeph>INTERMEDIATE
+      STRING</codeph>. Likewise, if the different phases of your UDA pass 2 separate <codeph>BIGINT</codeph> values
+      between them (8 bytes each), you would specify <codeph>INTERMEDIATE CHAR(16)</codeph> so that each function
+      could read from and write to a 16-byte buffer.
+    </p>
+
+    <p>
+      For end-to-end examples of UDAs, see <xref href="impala_udf.xml#udfs"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/udfs_no_complex_types"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <ul>
+      <li>
+        You can write Impala UDFs in either C++ or Java. C++ UDFs are new to Impala, and are the recommended format
+        for high performance utilizing native code. Java-based UDFs are compatible between Impala and Hive, and are
+        most suited to reusing existing Hive UDFs. (Impala can run Java-based Hive UDFs but not Hive UDAs.)
+      </li>
+
+      <li rev="2.5.0 IMPALA-1748 CDH-38369 IMPALA-2843 CDH-39148">
+        <keyword keyref="impala25_full"/> introduces UDF improvements to persistence for both C++ and Java UDFs,
+        and better compatibility between Impala and Hive for Java UDFs.
+        See <xref href="impala_udf.xml#udfs"/> for details.
+      </li>
+
+      <li>
+        The body of the UDF is represented by a <codeph>.so</codeph> or <codeph>.jar</codeph> file, which you store
+        in HDFS and the <codeph>CREATE FUNCTION</codeph> statement distributes to each Impala node.
+      </li>
+
+      <li>
+        Impala calls the underlying code during SQL statement evaluation, as many times as needed to process all
+        the rows from the result set. All UDFs are assumed to be deterministic, that is, to always return the same
+        result when passed the same argument values. Impala might or might not skip some invocations of a UDF if
+        the result value is already known from a previous call. Therefore, do not rely on the UDF being called a
+        specific number of times, and do not return different result values based on some external factor such as
+        the current time, a random number function, or an external data source that could be updated while an
+        Impala query is in progress.
+      </li>
+
+      <li>
+        The names of the function arguments in the UDF are not significant, only their number, positions, and data
+        types.
+      </li>
+
+      <li>
+        You can overload the same function name by creating multiple versions of the function, each with a
+        different argument signature. For security reasons, you cannot make a UDF with the same name as any
+        built-in function.
+      </li>
+
+      <li>
+        In the UDF code, you represent the function return result as a <codeph>struct</codeph>. This
+        <codeph>struct</codeph> contains 2 fields. The first field is a <codeph>boolean</codeph> representing
+        whether the value is <codeph>NULL</codeph> or not. (When this field is <codeph>true</codeph>, the return
+        value is interpreted as <codeph>NULL</codeph>.) The second field is the same type as the specified function
+        return type, and holds the return value when the function returns something other than
+        <codeph>NULL</codeph>.
+      </li>
+
+      <li>
+        In the UDF code, you represent the function arguments as an initial pointer to a UDF context structure,
+        followed by references to zero or more <codeph>struct</codeph>s, corresponding to each of the arguments.
+        Each <codeph>struct</codeph> has the same 2 fields as with the return value, a <codeph>boolean</codeph>
+        field representing whether the argument is <codeph>NULL</codeph>, and a field of the appropriate type
+        holding any non-<codeph>NULL</codeph> argument value.
+      </li>
+
+      <li>
+        For sample code and build instructions for UDFs,
+        see <xref href="https://github.com/cloudera/impala/tree/master/be/src/udf_samples" scope="external" format="html">the sample UDFs in the Impala github repo</xref>.
+      </li>
+
+      <li>
+        Because the file representing the body of the UDF is stored in HDFS, it is automatically available to all
+        the Impala nodes. You do not need to manually copy any UDF-related files between servers.
+      </li>
+
+      <li>
+        Because Impala currently does not have any <codeph>ALTER FUNCTION</codeph> statement, if you need to rename
+        a function, move it to a different database, or change its signature or other properties, issue a
+        <codeph>DROP FUNCTION</codeph> statement for the original function followed by a <codeph>CREATE
+        FUNCTION</codeph> with the desired properties.
+      </li>
+
+      <li>
+        Because each UDF is associated with a particular database, either issue a <codeph>USE</codeph> statement
+        before doing any <codeph>CREATE FUNCTION</codeph> statements, or specify the name of the function as
+        <codeph><varname>db_name</varname>.<varname>function_name</varname></codeph>.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      Impala can run UDFs that were created through Hive, as long as they refer to Impala-compatible data types
+      (not composite or nested column types). Hive can run Java-based UDFs that were created through Impala, but
+      not Impala UDFs written in C++.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/current_user_caveat"/>
+
+    <p><b>Persistence:</b></p>
+
+    <p conref="../shared/impala_common.xml#common/udf_persistence_restriction"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      For additional examples of all kinds of user-defined functions, see <xref href="impala_udf.xml#udfs"/>.
+    </p>
+
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      The following example shows how to take a Java jar file and make all the functions inside one of its classes
+      into UDFs under a single (overloaded) function name in Impala. Each <codeph>CREATE FUNCTION</codeph> or
+      <codeph>DROP FUNCTION</codeph> statement applies to all the overloaded Java functions with the same name.
+      This example uses the signatureless syntax for <codeph>CREATE FUNCTION</codeph> and <codeph>DROP FUNCTION</codeph>,
+      which is available in <keyword keyref="impala25_full"/> and higher.
+    </p>
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      At the start, the jar file is in the local filesystem. Then it is copied into HDFS, so that it is
+      available for Impala to reference through the <codeph>CREATE FUNCTION</codeph> statement and
+      queries that refer to the Impala function name.
+    </p>
+<codeblock rev="2.5.0 IMPALA-2843 CDH-39148">
+$ jar -tvf udf-examples-cdh570.jar
+     0 Mon Feb 22 04:06:50 PST 2016 META-INF/
+   122 Mon Feb 22 04:06:48 PST 2016 META-INF/MANIFEST.MF
+     0 Mon Feb 22 04:06:46 PST 2016 com/
+     0 Mon Feb 22 04:06:46 PST 2016 com/cloudera/
+     0 Mon Feb 22 04:06:46 PST 2016 com/cloudera/impala/
+  2460 Mon Feb 22 04:06:46 PST 2016 com/cloudera/impala/IncompatibleUdfTest.class
+   541 Mon Feb 22 04:06:46 PST 2016 com/cloudera/impala/TestUdfException.class
+  3438 Mon Feb 22 04:06:46 PST 2016 com/cloudera/impala/JavaUdfTest.class
+  5872 Mon Feb 22 04:06:46 PST 2016 com/cloudera/impala/TestUdf.class
+...
+$ hdfs dfs -put udf-examples-cdh570.jar /user/impala/udfs
+$ hdfs dfs -ls /user/impala/udfs
+Found 2 items
+-rw-r--r--   3 jrussell supergroup        853 2015-10-09 14:05 /user/impala/udfs/hello_world.jar
+-rw-r--r--   3 jrussell supergroup       7366 2016-06-08 14:25 /user/impala/udfs/udf-examples-cdh570.jar
+</codeblock>
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      In <cmdname>impala-shell</cmdname>, the <codeph>CREATE FUNCTION</codeph> refers to the HDFS path of the jar file
+      and the fully qualified class name inside the jar. Each of the functions inside the class becomes an
+      Impala function, each one overloaded under the specified Impala function name.
+    </p>
+<codeblock rev="2.5.0 IMPALA-2843 CDH-39148">
+[localhost:21000] > create function testudf location '/user/impala/udfs/udf-examples-cdh570.jar' symbol='com.cloudera.impala.TestUdf';
+[localhost:21000] > show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN, BOOLEAN)    | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE)                       | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE)               | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE, DOUBLE)       | JAVA        | true          |
+| FLOAT       | testudf(FLOAT)                        | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT)                 | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT, FLOAT)          | JAVA        | true          |
+| INT         | testudf(INT)                          | JAVA        | true          |
+| DOUBLE      | testudf(INT, DOUBLE)                  | JAVA        | true          |
+| INT         | testudf(INT, INT)                     | JAVA        | true          |
+| INT         | testudf(INT, INT, INT)                | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT)                     | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT)           | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT, SMALLINT) | JAVA        | true          |
+| STRING      | testudf(STRING)                       | JAVA        | true          |
+| STRING      | testudf(STRING, STRING)               | JAVA        | true          |
+| STRING      | testudf(STRING, STRING, STRING)       | JAVA        | true          |
+| TINYINT     | testudf(TINYINT)                      | JAVA        | true          |
++-------------+---------------------------------------+-------------+---------------+
+</codeblock>
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      These are all simple functions that return their single arguments, or
+      sum, concatenate, and so on their multiple arguments. Impala determines which
+      overloaded function to use based on the number and types of the arguments.
+    </p>
+<codeblock rev="2.5.0 IMPALA-2843 CDH-39148">
+insert into bigint_x values (1), (2), (4), (3);
+select testudf(x) from bigint_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| 1               |
+| 2               |
+| 4               |
+| 3               |
++-----------------+
+
+insert into int_x values (1), (2), (4), (3);
+select testudf(x, x+1, x*x) from int_x;
++-------------------------------+
+| udfs.testudf(x, x + 1, x * x) |
++-------------------------------+
+| 4                             |
+| 9                             |
+| 25                            |
+| 16                            |
++-------------------------------+
+
+select testudf(x) from string_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| one             |
+| two             |
+| four            |
+| three           |
++-----------------+
+select testudf(x,x) from string_x;
++--------------------+
+| udfs.testudf(x, x) |
++--------------------+
+| oneone             |
+| twotwo             |
+| fourfour           |
+| threethree         |
++--------------------+
+</codeblock>
+
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      The previous example used the same Impala function name as the name of the class.
+      This example shows how the Impala function name is independent of the underlying
+      Java class or function names. A second <codeph>CREATE FUNCTION</codeph> statement
+      results in a set of overloaded functions all named <codeph>my_func</codeph>,
+      to go along with the overloaded functions all named <codeph>testudf</codeph>.
+    </p>
+<codeblock rev="2.5.0 IMPALA-2843 CDH-39148">
+create function my_func location '/user/impala/udfs/udf-examples-cdh570.jar'
+  symbol='com.cloudera.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | my_func(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</codeblock>
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      The corresponding <codeph>DROP FUNCTION</codeph> statement with no signature
+      drops all the overloaded functions with that name.
+    </p>
+<codeblock rev="2.5.0 IMPALA-2843 CDH-39148">
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</codeblock>
+    <p rev="2.5.0 IMPALA-2843 CDH-39148">
+      The signatureless <codeph>CREATE FUNCTION</codeph> syntax for Java UDFs ensures that
+      the functions shown in this example remain available after the Impala service
+      (specifically, the Catalog Server) are restarted.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_udf.xml#udfs"/> for more background information, usage instructions, and examples for
+      Impala UDFs; <xref href="impala_drop_function.xml#drop_function"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_role.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_role.xml b/docs/topics/impala_create_role.xml
new file mode 100644
index 0000000..2308490
--- /dev/null
+++ b/docs/topics/impala_create_role.xml
@@ -0,0 +1,70 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.4.0" id="create_role">
+
+  <title>CREATE ROLE Statement (<keyword keyref="impala20"/> or higher only)</title>
+  <titlealts audience="PDF"><navtitle>CREATE ROLE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="Roles"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+      <!-- Consider whether to go deeper into categories like Security for the Sentry-related statements. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE ROLE statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. -->
+      The <codeph>CREATE ROLE</codeph> statement creates a role to which privileges can be granted. Privileges can
+      be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be
+      able to exercise the privileges of that role. Only users that have administrative privileges can create/drop
+      roles. By default, the <codeph>hive</codeph>, <codeph>impala</codeph> and <codeph>hue</codeph> users have
+      administrative privileges in Sentry.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE ROLE <varname>role_name</varname>
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+    <p>
+      Only administrative users (those with <codeph>ALL</codeph> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      Impala makes use of any roles and privileges specified by the <codeph>GRANT</codeph> and
+      <codeph>REVOKE</codeph> statements in Hive, and Hive makes use of any roles and privileges specified by the
+      <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala. The Impala <codeph>GRANT</codeph>
+      and <codeph>REVOKE</codeph> statements for privileges do not require the <codeph>ROLE</codeph> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+<!-- To do: nail down the new SHOW syntax, e.g. SHOW ROLES, SHOW CURRENT ROLES, SHOW GROUPS. -->
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_authorization.xml#authorization"/>, <xref href="impala_grant.xml#grant"/>,
+      <xref href="impala_revoke.xml#revoke"/>, <xref href="impala_drop_role.xml#drop_role"/>,
+      <xref href="impala_show.xml#show"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_table.xml b/docs/topics/impala_create_table.xml
new file mode 100644
index 0000000..ba83fc5
--- /dev/null
+++ b/docs/topics/impala_create_table.xml
@@ -0,0 +1,832 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="create_table" outputclass="impala sql_statement">
+
+  <title outputclass="impala_title sql_statement_title">CREATE TABLE Statement</title>
+  <titlealts audience="PDF"><navtitle>CREATE TABLE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="HDFS Caching"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="S3"/>
+      <!-- <data name="Category" value="Kudu"/> -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE TABLE statement</indexterm>
+      Creates a new table and specifies its characteristics. While creating a table, you optionally specify aspects
+      such as:
+    </p>
+
+    <ul>
+      <li>
+        Whether the table is internal or external.
+      </li>
+
+      <li>
+        The columns and associated data types.
+      </li>
+
+      <li>
+        The columns used for physically partitioning the data.
+      </li>
+
+      <li>
+        The file format for data files.
+      </li>
+
+      <li>
+        The HDFS directory where the data files are located.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      The general syntax for creating a table and specifying its columns is as follows:
+    </p>
+
+    <p>
+      <b>Explicit column definitions:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+  [COMMENT '<varname>table_comment</varname>']
+  [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+</codeblock>
+
+    <p>
+      <b>Column definitions inferred from data file:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
+  [COMMENT '<varname>table_comment</varname>']
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+  [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+data_type:
+    <varname>primitive_type</varname>
+  | array_type
+  | map_type
+  | struct_type
+</codeblock>
+
+    <p>
+      <b>CREATE TABLE AS SELECT:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
+  <ph rev="2.5.0">[PARTITIONED BY (<varname>col_name</varname>[, ...])]</ph>
+  [COMMENT '<varname>table_comment</varname>']
+  [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] <ph rev="CDH-41501">[STORED AS <varname>ctas_file_format</varname>]</ph>
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+AS
+  <varname>select_statement</varname></codeblock>
+
+<codeblock>primitive_type:
+    TINYINT
+  | SMALLINT
+  | INT
+  | BIGINT
+  | BOOLEAN
+  | FLOAT
+  | DOUBLE
+  <ph rev="1.4.0">| DECIMAL</ph>
+  | STRING
+  <ph rev="2.0.0">| CHAR</ph>
+  <ph rev="2.0.0">| VARCHAR</ph>
+  | TIMESTAMP
+
+<ph rev="2.3.0">complex_type:
+    struct_type
+  | array_type
+  | map_type
+
+struct_type: STRUCT &lt; <varname>name</varname> : <varname>primitive_or_complex_type</varname> [COMMENT '<varname>comment_string</varname>'], ... &gt;
+
+array_type: ARRAY &lt; <varname>primitive_or_complex_type</varname> &gt;
+
+map_type: MAP &lt; <varname>primitive_type</varname>, <varname>primitive_or_complex_type</varname> &gt;
+</ph>
+row_format:
+  DELIMITED [FIELDS TERMINATED BY '<varname>char</varname>' [ESCAPED BY '<varname>char</varname>']]
+  [LINES TERMINATED BY '<varname>char</varname>']
+
+file_format:
+    PARQUET
+  | TEXTFILE
+  | AVRO
+  | SEQUENCEFILE
+  | RCFILE
+
+<ph rev="CDH-41501">ctas_file_format:
+    PARQUET
+  | TEXTFILE</ph>
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <!-- Should really have some info up front about all the data types and file formats.
+         Consider adding here, or at least making inline links to the relevant keywords
+         in the syntax spec above. -->
+
+    <p>
+      <b>Column definitions:</b>
+    </p>
+
+    <p>
+      Depending on the form of the <codeph>CREATE TABLE</codeph> statement, the column definitions are
+      required or not allowed.
+    </p>
+
+    <p>
+      With the <codeph>CREATE TABLE AS SELECT</codeph> and <codeph>CREATE TABLE LIKE</codeph>
+      syntax, you do not specify the columns at all; the column names and types are derived from the source table, query,
+      or data file.
+    </p>
+
+    <p>
+      With the basic <codeph>CREATE TABLE</codeph> syntax, you must list one or more columns,
+      its name, type, and optionally a comment, in addition to any columns used as partitioning keys.
+      There is one exception where the column list is not required: when creating an Avro table with the
+      <codeph>STORED AS AVRO</codeph> clause, you can omit the list of columns and specify the same metadata
+      as part of the <codeph>TBLPROPERTIES</codeph> clause.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      The Impala complex types (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
+      are available in <keyword keyref="impala23_full"/> and higher.
+      Because you can nest these types (for example, to make an array of maps or a struct
+      with an array field), these types are also sometimes referred to as nested types.
+      See <xref href="impala_complex_types.xml#complex_types"/> for usage details.
+    </p>
+
+    <!-- This is kind of an obscure and rare usage scenario. Consider moving all the complex type stuff further down
+         after some of the more common clauses. -->
+    <p rev="2.3.0">
+      Impala can create tables containing complex type columns, with any supported file format.
+      Because currently Impala can only query complex type columns in Parquet tables, creating
+      tables with complex type columns and other file formats such as text is of limited use.
+      For example, you might create a text table including some columns with complex types with Impala, and use Hive
+      as part of your to ingest the nested type data and copy it to an identical Parquet table.
+      Or you might create a partitioned table containing complex type columns using one file format, and
+      use <codeph>ALTER TABLE</codeph> to change the file format of individual partitions to Parquet; Impala
+      can then query only the Parquet-format partitions in that table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
+
+    <p>
+      <b>Internal and external tables (EXTERNAL and LOCATION clauses):</b>
+    </p>
+
+    <p>
+      By default, Impala creates an <q>internal</q> table, where Impala manages the underlying data files for the
+      table, and physically deletes the data files when you drop the table. If you specify the
+      <codeph>EXTERNAL</codeph> clause, Impala treats the table as an <q>external</q> table, where the data files
+      are typically produced outside Impala and queried from their original locations in HDFS, and Impala leaves
+      the data files in place when you drop the table. For details about internal and external tables, see
+      <xref href="impala_tables.xml#tables"/>.
+    </p>
+
+    <p>
+      Typically, for an external table you include a <codeph>LOCATION</codeph> clause to specify the path to the
+      HDFS directory where Impala reads and writes files for the table. For example, if your data pipeline produces
+      Parquet files in the HDFS directory <filepath>/user/etl/destination</filepath>, you might create an external
+      table as follows:
+    </p>
+
+<codeblock>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
+  STORED AS PARQUET LOCATION '/user/etl/destination';
+</codeblock>
+
+    <p>
+      Although the <codeph>EXTERNAL</codeph> and <codeph>LOCATION</codeph> clauses are often specified together,
+      <codeph>LOCATION</codeph> is optional for external tables, and you can also specify <codeph>LOCATION</codeph>
+      for internal tables. The difference is all about whether Impala <q>takes control</q> of the underlying data
+      files and moves them when you rename the table, or deletes them when you drop the table. For more about
+      internal and external tables and how they interact with the <codeph>LOCATION</codeph> attribute, see
+      <xref href="impala_tables.xml#tables"/>.
+    </p>
+
+    <p>
+      <b>Partitioned tables (PARTITIONED BY clause):</b>
+    </p>
+
+    <p>
+      The <codeph>PARTITIONED BY</codeph> clause divides the data files based on the values from one or more
+      specified columns. Impala queries can use the partition metadata to minimize the amount of data that is read
+      from disk or transmitted across the network, particularly during join queries. For details about
+      partitioning, see <xref href="impala_partitioning.xml#partitioning"/>.
+    </p>
+
+    <p rev="2.5.0">
+      Prior to <keyword keyref="impala25_full"/> you could use a partitioned table
+      as the source and copy data from it, but could not specify any partitioning clauses for the new table.
+      In <keyword keyref="impala25_full"/> and higher, you can now use the <codeph>PARTITIONED BY</codeph> clause with a
+      <codeph>CREATE TABLE AS SELECT</codeph> statement. See the examples under the following discussion of
+      the <codeph>CREATE TABLE AS SELECT</codeph> syntax variation.
+    </p>
+
+<!--
+    <p rev="kudu">
+      <b>Partitioning for Kudu tables (DISTRIBUTE BY clause)</b>
+    </p>
+
+    <p rev="kudu">
+      For Kudu tables, you specify logical partitioning across one or more columns using the
+      <codeph>DISTRIBUTE BY</codeph> clause. In contrast to partitioning for HDFS-based tables,
+      multiple values for a partition key column can be located in the same partition.
+      The optional <codeph>HASH</codeph> clause lets you divide one or a set of partition key columns
+      into a specified number of buckets; you can use more than one <codeph>HASH</codeph>
+      clause, specifying a distinct set of partition key columns for each.
+      The optional <codeph>RANGE</codeph> clause further subdivides the partitions, based on
+      a set of literal values for the partition key columns.
+    </p>
+-->
+
+    <p>
+      <b>Specifying file format (STORED AS and ROW FORMAT clauses):</b>
+    </p>
+
+    <p rev="DOCS-1523">
+      The <codeph>STORED AS</codeph> clause identifies the format of the underlying data files. Currently, Impala
+      can query more types of file formats than it can create or insert into. Use Hive to perform any create or
+      data load operations that are not currently available in Impala. For example, Impala can create an Avro,
+      SequenceFile, or RCFile table but cannot insert data into it. There are also Impala-specific procedures for using
+      compression with each kind of file format. For details about working with data files of various formats, see
+      <xref href="impala_file_formats.xml#file_formats"/>.
+    </p>
+
+    <note>
+      In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing the <codeph>CREATE
+      TABLE</codeph> statement in Hive. See <xref href="impala_avro.xml#avro"/> for details and examples.
+    </note>
+
+    <p>
+      By default (when no <codeph>STORED AS</codeph> clause is specified), data files in Impala tables are created
+      as text files with Ctrl-A (hex 01) characters as the delimiter.
+<!-- Verify if ROW FORMAT is entirely ignored outside of text tables, or does it apply somehow to SequenceFile and/or RCFile too? -->
+      Specify the <codeph>ROW FORMAT DELIMITED</codeph> clause to produce or ingest data files that use a different
+      delimiter character such as tab or <codeph>|</codeph>, or a different line end character such as carriage
+      return or newline. When specifying delimiter and line end characters with the <codeph>FIELDS TERMINATED
+      BY</codeph> and <codeph>LINES TERMINATED BY</codeph> clauses, use <codeph>'\t'</codeph> for tab,
+      <codeph>'\n'</codeph> for newline or linefeed, <codeph>'\r'</codeph> for carriage return, and
+      <codeph>\</codeph><codeph>0</codeph> for ASCII <codeph>nul</codeph> (hex 00). For more examples of text
+      tables, see <xref href="impala_txtfile.xml#txtfile"/>.
+    </p>
+
+    <p>
+      The <codeph>ESCAPED BY</codeph> clause applies both to text files that you create through an
+      <codeph>INSERT</codeph> statement to an Impala <codeph>TEXTFILE</codeph> table, and to existing data files
+      that you put into an Impala table directory. (You can ingest existing data files either by creating the table
+      with <codeph>CREATE EXTERNAL TABLE ... LOCATION</codeph>, the <codeph>LOAD DATA</codeph> statement, or
+      through an HDFS operation such as <codeph>hdfs dfs -put <varname>file</varname>
+      <varname>hdfs_path</varname></codeph>.) Choose an escape character that is not used anywhere else in the
+      file, and put it in front of each instance of the delimiter character that occurs within a field value.
+      Surrounding field values with quotation marks does not help Impala to parse fields with embedded delimiter
+      characters; the quotation marks are considered to be part of the column value. If you want to use
+      <codeph>\</codeph> as the escape character, specify the clause in <cmdname>impala-shell</cmdname> as
+      <codeph>ESCAPED BY '\\'</codeph>.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/thorn"/>
+
+    <p>
+      <b>Cloning tables (LIKE clause):</b>
+    </p>
+
+    <p>
+      To create an empty table with the same columns, comments, and other attributes as another table, use the
+      following variation. The <codeph>CREATE TABLE ... LIKE</codeph> form allows a restricted set of clauses,
+      currently only the <codeph>LOCATION</codeph>, <codeph>COMMENT</codeph>, and <codeph>STORED AS</codeph>
+      clauses.
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  <ph rev="1.4.0">LIKE { [<varname>db_name</varname>.]<varname>table_name</varname> | PARQUET '<varname>hdfs_path_of_parquet_file</varname>' }</ph>
+  [COMMENT '<varname>table_comment</varname>']
+  [STORED AS <varname>file_format</varname>]
+  [LOCATION '<varname>hdfs_path</varname>']</codeblock>
+
+    <note rev="1.2.0">
+      <p rev="1.2.0">
+        To clone the structure of a table and transfer data into it in a single operation, use the <codeph>CREATE
+        TABLE AS SELECT</codeph> syntax described in the next subsection.
+      </p>
+    </note>
+
+    <p>
+      When you clone the structure of an existing table using the <codeph>CREATE TABLE ... LIKE</codeph> syntax,
+      the new table keeps the same file format as the original one, so you only need to specify the <codeph>STORED
+      AS</codeph> clause if you want to use a different file format, or when specifying a view as the original
+      table. (Creating a table <q>like</q> a view produces a text table by default.)
+    </p>
+
+    <p>
+      Although normally Impala cannot create an HBase table directly, Impala can clone the structure of an existing
+      HBase table with the <codeph>CREATE TABLE ... LIKE</codeph> syntax, preserving the file format and metadata
+      from the original table.
+    </p>
+
+    <p>
+      There are some exceptions to the ability to use <codeph>CREATE TABLE ... LIKE</codeph> with an Avro table.
+      For example, you cannot use this technique for an Avro table that is specified with an Avro schema but no
+      columns. When in doubt, check if a <codeph>CREATE TABLE ... LIKE</codeph> operation works in Hive; if not, it
+      typically will not work in Impala either.
+    </p>
+
+    <p>
+      If the original table is partitioned, the new table inherits the same partition key columns. Because the new
+      table is initially empty, it does not inherit the actual partitions that exist in the original one. To create
+      partitions in the new table, insert data or issue <codeph>ALTER TABLE ... ADD PARTITION</codeph> statements.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/create_table_like_view"/>
+
+    <p>
+      Because <codeph>CREATE TABLE ... LIKE</codeph> only manipulates table metadata, not the physical data of the
+      table, issue <codeph>INSERT INTO TABLE</codeph> statements afterward to copy any data from the original table
+      into the new one, optionally converting the data to a new file format. (For some file formats, Impala can do
+      a <codeph>CREATE TABLE ... LIKE</codeph> to create the table, but Impala cannot insert data in that file
+      format; in these cases, you must load the data in Hive. See
+      <xref href="impala_file_formats.xml#file_formats"/> for details.)
+    </p>
+
+    <p rev="1.2" id="ctas">
+      <b>CREATE TABLE AS SELECT:</b>
+    </p>
+
+    <p>
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax is a shorthand notation to create a table based on column
+      definitions from another table, and copy data from the source table to the destination table without issuing
+      any separate <codeph>INSERT</codeph> statement. This idiom is so popular that it has its own acronym,
+      <q>CTAS</q>.
+    </p>
+
+    <p>
+      The following examples show how to copy data from a source table <codeph>T1</codeph>
+      to a variety of destinations tables, applying various transformations to the table
+      properties, table layout, or the data itself as part of the operation:
+    </p>
+
+<codeblock>
+-- Sample table to be the source of CTAS operations.
+CREATE TABLE t1 (x INT, y STRING);
+INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+
+-- Clone all the columns and data from one table to another.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Clone the columns and data, and convert the data to a different file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Copy only some rows to the new table.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x >= 2;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+
+-- Same idea as CREATE TABLE LIKE: clone table layout but do not copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 0 row(s) |
++-------------------+
+
+-- Reorder and rename columns and transform the data.
+CREATE TABLE t5 AS SELECT upper(y) AS s, x+1 AS a, 'Entirely new column' AS n FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+SELECT * FROM t5;
++-------+---+---------------------+
+| s     | a | n                   |
++-------+---+---------------------+
+| ONE   | 2 | Entirely new column |
+| TWO   | 3 | Entirely new column |
+| THREE | 4 | Entirely new column |
++-------+---+---------------------+
+</codeblock>
+
+<!-- These are a little heavyweight to get into here. Therefore commenting out.
+     Some overlap with the new column-changing examples in the code listing above.
+Create tables with different column order, names, or types than the original.
+CREATE TABLE some_columns_from_t1 AS SELECT c1, c3, c5 FROM t1;
+CREATE TABLE reordered_columns_from_t1 AS SELECT c4, c3, c1, c2 FROM t1;
+CREATE TABLE synthesized_columns AS SELECT upper(c1) AS all_caps, c2+c3 AS total, "California" AS state FROM t1;</codeblock>
+-->
+
+<!-- CREATE TABLE AS <select> now incorporated up higher in the original syntax diagram. -->
+
+    <p rev="1.2">
+      See <xref href="impala_select.xml#select"/> for details about query syntax for the <codeph>SELECT</codeph>
+      portion of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+    </p>
+
+    <p rev="1.2">
+      The newly created table inherits the column names that you select from the original table, which you can
+      override by specifying column aliases in the query. Any column or table comments from the original table are
+      not carried over to the new table.
+    </p>
+
+    <note rev="DOCS-1523">
+      When using the <codeph>STORED AS</codeph> clause with a <codeph>CREATE TABLE AS SELECT</codeph>
+      statement, the destination table must be a file format that Impala can write to: currently,
+      text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile table as the destination
+      table for a CTAS operation.
+    </note>
+
+    <p rev="2.5.0">
+      Prior to <keyword keyref="impala25_full"/> you could use a partitioned table
+      as the source and copy data from it, but could not specify any partitioning clauses for the new table.
+      In <keyword keyref="impala25_full"/> and higher, you can now use the <codeph>PARTITIONED BY</codeph> clause with a
+      <codeph>CREATE TABLE AS SELECT</codeph> statement. The following example demonstrates how you can copy
+      data from an unpartitioned table in a <codeph>CREATE TABLE AS SELECT</codeph> operation, creating a new
+      partitioned table in the process. The main syntax consideration is the column order in the <codeph>PARTITIONED BY</codeph>
+      clause and the select list: the partition key columns must be listed last in the select list, in the same
+      order as in the <codeph>PARTITIONED BY</codeph> clause. Therefore, in this case, the column order in the
+      destination table is different from the source table. You also only specify the column names in the
+      <codeph>PARTITIONED BY</codeph> clause, not the data types or column comments.
+    </p>
+
+<codeblock rev="2.5.0">
+create table partitions_no (year smallint, month tinyint, s string);
+insert into partitions_no values (2016, 1, 'January 2016'),
+  (2016, 2, 'February 2016'), (2016, 3, 'March 2016');
+
+-- Prove that the source table is not partitioned.
+show partitions partitions_no;
+ERROR: AnalysisException: Table is not partitioned: ctas_partition_by.partitions_no
+
+-- Create new table with partitions based on column values from source table.
+<b>create table partitions_yes partitioned by (year, month)
+  as select s, year, month from partitions_no;</b>
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Prove that the destination table is partitioned.
+show partitions partitions_yes;
++-------+-------+-------+--------+------+...
+| year  | month | #Rows | #Files | Size |...
++-------+-------+-------+--------+------+...
+| 2016  | 1     | -1    | 1      | 13B  |...
+| 2016  | 2     | -1    | 1      | 14B  |...
+| 2016  | 3     | -1    | 1      | 11B  |...
+| Total |       | -1    | 3      | 38B  |...
++-------+-------+-------+--------+------+...
+</codeblock>
+
+  <p rev="2.5.0">
+    The most convenient layout for partitioned tables is with all the
+    partition key columns at the end. The CTAS <codeph>PARTITIONED BY</codeph> syntax
+    requires that column order in the select list, resulting in that same
+    column order in the destination table.
+  </p>
+
+<codeblock rev="2.5.0">
+describe partitions_no;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| year  | smallint |         |
+| month | tinyint  |         |
+| s     | string   |         |
++-------+----------+---------+
+
+-- The CTAS operation forced us to put the partition key columns last.
+-- Having those columns last works better with idioms such as SELECT *
+-- for partitioned tables.
+describe partitions_yes;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| s     | string   |         |
+| year  | smallint |         |
+| month | tinyint  |         |
++-------+----------+---------+
+</codeblock>
+
+  <p rev="2.5.0">
+    Attempting to use a select list with the partition key columns
+    not at the end results in an error due to a column name mismatch:
+  </p>
+
+<codeblock rev="2.5.0">
+-- We expect this CTAS to fail because non-key column S
+-- comes after key columns YEAR and MONTH in the select list.
+create table partitions_maybe partitioned by (year, month)
+  as select year, month, s from partitions_no;
+ERROR: AnalysisException: Partition column name mismatch: year != month
+</codeblock>
+
+    <p rev="1.2">
+      For example, the following statements show how you can clone all the data in a table, or a subset of the
+      columns and/or rows, or reorder columns, rename them, or construct them out of expressions:
+    </p>
+
+    <p rev="1.2">
+      As part of a CTAS operation, you can convert the data to any file format that Impala can write (currently,
+      <codeph>TEXTFILE</codeph> and <codeph>PARQUET</codeph>). You cannot specify the lower-level properties of a
+      text table, such as the delimiter.
+    </p>
+
+    <p rev="obwl" conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p rev="1.4.0">
+      <b>CREATE TABLE LIKE PARQUET:</b>
+    </p>
+
+    <p rev="1.4.0">
+      The variation <codeph>CREATE TABLE ... LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'</codeph>
+      lets you skip the column definitions of the <codeph>CREATE TABLE</codeph> statement. The column names and
+      data types are automatically configured based on the organization of the specified Parquet data file, which
+      must already reside in HDFS. You can use a data file located outside the Impala database directories, or a
+      file from an existing Impala Parquet table; either way, Impala only uses the column definitions from the file
+      and does not use the HDFS location for the <codeph>LOCATION</codeph> attribute of the new table. (Although
+      you can also specify the enclosing directory with the <codeph>LOCATION</codeph> attribute, to both use the
+      same schema as the data file and point the Impala table at the associated directory for querying.)
+    </p>
+
+    <p rev="1.4.0">
+      The following considerations apply when you use the <codeph>CREATE TABLE LIKE PARQUET</codeph> technique:
+    </p>
+
+    <ul rev="1.4.0">
+      <li>
+        Any column comments from the original table are not preserved in the new table. Each column in the new
+        table has a comment stating the low-level Parquet field type used to deduce the appropriate SQL column
+        type.
+      </li>
+
+      <li>
+        If you use a data file from a partitioned Impala table, any partition key columns from the original table
+        are left out of the new table, because they are represented in HDFS directory names rather than stored in
+        the data file. To preserve the partition information, repeat the same <codeph>PARTITION</codeph> clause as
+        in the original <codeph>CREATE TABLE</codeph> statement.
+      </li>
+
+      <li>
+        The file format of the new table defaults to text, as with other kinds of <codeph>CREATE TABLE</codeph>
+        statements. To make the new table also use Parquet format, include the clause <codeph>STORED AS
+        PARQUET</codeph> in the <codeph>CREATE TABLE LIKE PARQUET</codeph> statement.
+      </li>
+
+      <li>
+        If the Parquet data file comes from an existing Impala table, currently, any <codeph>TINYINT</codeph> or
+        <codeph>SMALLINT</codeph> columns are turned into <codeph>INT</codeph> columns in the new table.
+        Internally, Parquet stores such values as 32-bit integers.
+      </li>
+
+      <li>
+        When the destination table uses the Parquet file format, the <codeph>CREATE TABLE AS SELECT</codeph> and
+        <codeph>INSERT ... SELECT</codeph> statements always create at least one data file, even if the
+        <codeph>SELECT</codeph> part of the statement does not match any rows. You can use such an empty Parquet
+        data file as a template for subsequent <codeph>CREATE TABLE LIKE PARQUET</codeph> statements.
+      </li>
+    </ul>
+
+    <p>
+      For more details about creating Parquet tables, and examples of the <codeph>CREATE TABLE LIKE
+      PARQUET</codeph> syntax, see <xref href="impala_parquet.xml#parquet"/>.
+    </p>
+
+    <p>
+      <b>Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES clauses):</b>
+    </p>
+
+    <p rev="1.2">
+      You can associate arbitrary items of metadata with a table by specifying the <codeph>TBLPROPERTIES</codeph>
+      clause. This clause takes a comma-separated list of key-value pairs and stores those items in the metastore
+      database. You can also change the table properties later with an <codeph>ALTER TABLE</codeph> statement. You
+      can observe the table properties for different delimiter and escape characters using the <codeph>DESCRIBE
+      FORMATTED</codeph> command, and change those settings for an existing table with <codeph>ALTER TABLE ... SET
+      TBLPROPERTIES</codeph>.
+    </p>
+
+    <p rev="1.2">
+      You can also associate SerDes properties with the table by specifying key-value pairs through the
+      <codeph>WITH SERDEPROPERTIES</codeph> clause. This metadata is not used by Impala, which has its own built-in
+      serializer and deserializer for the file formats it supports. Particular property values might be needed for
+      Hive compatibility with certain variations of file formats, particularly Avro.
+    </p>
+
+    <p>
+      Some DDL operations that interact with other Hadoop components require specifying particular values in the
+      <codeph>SERDEPROPERTIES</codeph> or <codeph>TBLPROPERTIES</codeph> fields, such as creating an Avro table or
+      an HBase table. (You typically create HBase tables in Hive, because they require additional clauses not
+      currently available in Impala.)
+<!-- Haven't got a working example from Lenni, so suppressing this recommendation for now.
+        The Avro schema properties can be specified through either
+        <codeph>TBLPROPERTIES</codeph> or <codeph>SERDEPROPERTIES</codeph>;
+        for best compatibility with future versions of Hive,
+        use <codeph>SERDEPROPERTIES</codeph> in this case.
+-->
+    </p>
+
+    <p>
+      To see the column definitions and column comments for an existing table, for example before issuing a
+      <codeph>CREATE TABLE ... LIKE</codeph> or a <codeph>CREATE TABLE ... AS SELECT</codeph> statement, issue the
+      statement <codeph>DESCRIBE <varname>table_name</varname></codeph>. To see even more detail, such as the
+      location of data files and the values for clauses such as <codeph>ROW FORMAT</codeph> and <codeph>STORED
+      AS</codeph>, issue the statement <codeph>DESCRIBE FORMATTED <varname>table_name</varname></codeph>.
+      <codeph>DESCRIBE FORMATTED</codeph> is also needed to see any overall table comment (as opposed to individual
+      column comments).
+    </p>
+
+    <p>
+      After creating a table, your <cmdname>impala-shell</cmdname> session or another
+      <cmdname>impala-shell</cmdname> connected to the same node can immediately query that table. There might be a
+      brief interval (one statestore heartbeat) before the table can be queried through a different Impala node. To
+      make the <codeph>CREATE TABLE</codeph> statement return only when the table is recognized by all Impala nodes
+      in the cluster, enable the <codeph>SYNC_DDL</codeph> query option.
+    </p>
+
+    <p rev="1.4.0">
+      <b>HDFS caching (CACHED IN clause):</b>
+    </p>
+
+    <p rev="1.4.0">
+      If you specify the <codeph>CACHED IN</codeph> clause, any existing or future data files in the table
+      directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
+      mechanism. See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/impala_cache_replication_factor"/>
+
+<!-- Say something in here about the SHOW statement, e.g. SHOW TABLES, SHOW TABLE/COLUMN STATS, SHOW PARTITIONS. -->
+
+    <p>
+      <b>Column order</b>:
+    </p>
+
+    <p>
+      If you intend to use the table to hold data files produced by some external source, specify the columns in
+      the same order as they appear in the data files.
+    </p>
+
+    <p>
+      If you intend to insert or copy data into the table through Impala, or if you have control over the way
+      externally produced data files are arranged, use your judgment to specify columns in the most convenient
+      order:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          If certain columns are often <codeph>NULL</codeph>, specify those columns last. You might produce data
+          files that omit these trailing columns entirely. Impala automatically fills in the <codeph>NULL</codeph>
+          values if so.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          If an unpartitioned table will be used as the source for an <codeph>INSERT ... SELECT</codeph> operation
+          into a partitioned table, specify last in the unpartitioned table any columns that correspond to
+          partition key columns in the partitioned table, and in the same order as the partition key columns are
+          declared in the partitioned table. This technique lets you use <codeph>INSERT ... SELECT *</codeph> when
+          copying data to the partitioned table, rather than specifying each column name individually.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          If you specify columns in an order that you later discover is suboptimal, you can sometimes work around
+          the problem without recreating the table. You can create a view that selects columns from the original
+          table in a permuted order, then do a <codeph>SELECT *</codeph> from the view. When inserting data into a
+          table, you can specify a permuted order for the inserted columns to match the order in the destination
+          table.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+
+    <p>
+      Impala queries can make use of metadata about the table and columns, such as the number of rows in a table or
+      the number of different values in a column. Prior to Impala 1.2.2, to create this metadata, you issued the
+      <codeph>ANALYZE TABLE</codeph> statement in Hive to gather this information, after creating the table and
+      loading representative data into it. In Impala 1.2.2 and higher, the <codeph>COMPUTE STATS</codeph> statement
+      produces these statistics within Impala, without needing to use Hive at all.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+    <note>
+      <p>
+        The Impala <codeph>CREATE TABLE</codeph> statement cannot create an HBase table, because it currently does
+        not support the <codeph>STORED BY</codeph> clause needed for HBase tables. Create such tables in Hive, then
+        query them through Impala. For information on using Impala with HBase tables, see
+        <xref href="impala_hbase.xml#impala_hbase"/>.
+      </p>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p rev="2.2.0">
+      To create a table where the data resides in the Amazon Simple Storage Service (S3),
+      specify a <codeph>s3a://</codeph> prefix <codeph>LOCATION</codeph> attribute pointing to the data files in S3.
+    </p>
+
+    <p rev="2.6.0 CDH-39913 IMPALA-1878">
+      In <keyword keyref="impala26_full"/> and higher, you can
+      use this special <codeph>LOCATION</codeph> syntax
+      as part of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/s3_ddl"/>
+
+    <p conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      The <codeph>CREATE TABLE</codeph> statement for an internal table creates a directory in HDFS. The
+      <codeph>CREATE EXTERNAL TABLE</codeph> statement associates the table with an existing HDFS directory, and
+      does not create any new directory in HDFS. To locate the HDFS data directory for a table, issue a
+      <codeph>DESCRIBE FORMATTED <varname>table</varname></codeph> statement. To examine the contents of that HDFS
+      directory, use an OS command such as <codeph>hdfs dfs -ls hdfs://<varname>path</varname></codeph>, either
+      from the OS command line or through the <codeph>shell</codeph> or <codeph>!</codeph> commands in
+      <cmdname>impala-shell</cmdname>.
+    </p>
+
+    <p>
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax creates data files under the table data directory to hold
+      any data copied by the <codeph>INSERT</codeph> portion of the statement. (Even if no data is copied, Impala
+      might create one or more empty data files.)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have both execute and write
+      permission for the database directory where the table is being created.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_tables.xml#tables"/>,
+      <xref href="impala_alter_table.xml#alter_table"/>, <xref href="impala_drop_table.xml#drop_table"/>,
+      <xref href="impala_partitioning.xml#partitioning"/>, <xref href="impala_tables.xml#internal_tables"/>,
+      <xref href="impala_tables.xml#external_tables"/>, <xref href="impala_compute_stats.xml#compute_stats"/>,
+      <xref href="impala_sync_ddl.xml#sync_ddl"/>, <xref href="impala_show.xml#show_tables"/>,
+      <xref href="impala_show.xml#show_create_table"/>, <xref href="impala_describe.xml#describe"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_view.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_view.xml b/docs/topics/impala_create_view.xml
new file mode 100644
index 0000000..446acf2
--- /dev/null
+++ b/docs/topics/impala_create_view.xml
@@ -0,0 +1,139 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="create_view">
+
+  <title>CREATE VIEW Statement</title>
+  <titlealts audience="PDF"><navtitle>CREATE VIEW</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Views"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE VIEW statement</indexterm>
+      The <codeph>CREATE VIEW</codeph> statement lets you create a shorthand abbreviation for a more complicated
+      query. The base query can involve joins, expressions, reordered columns, column aliases, and other SQL
+      features that can make a query hard to understand or maintain.
+    </p>
+
+    <p>
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <codeph>ALTER VIEW</codeph> only involves changes to metadata in the metastore database, not any data files
+      in HDFS.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE VIEW [IF NOT EXISTS] <varname>view_name</varname> [(<varname>column_list</varname>)]
+  AS <varname>select_statement</varname></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      The <codeph>CREATE VIEW</codeph> statement can be useful in scenarios such as the following:
+    </p>
+
+    <ul>
+      <li>
+        To turn even the most lengthy and complicated SQL query into a one-liner. You can issue simple queries
+        against the view from applications, scripts, or interactive queries in <cmdname>impala-shell</cmdname>.
+        For example:
+<codeblock>select * from <varname>view_name</varname>;
+select * from <varname>view_name</varname> order by c1 desc limit 10;</codeblock>
+        The more complicated and hard-to-read the original query, the more benefit there is to simplifying the
+        query using a view.
+      </li>
+
+      <li>
+        To hide the underlying table and column names, to minimize maintenance problems if those names change. In
+        that case, you re-create the view using the new names, and all queries that use the view rather than the
+        underlying tables keep running with no changes.
+      </li>
+
+      <li>
+        To experiment with optimization techniques and make the optimized queries available to all applications.
+        For example, if you find a combination of <codeph>WHERE</codeph> conditions, join order, join hints, and so
+        on that works the best for a class of queries, you can establish a view that incorporates the
+        best-performing techniques. Applications can then make relatively simple queries against the view, without
+        repeating the complicated and optimized logic over and over. If you later find a better way to optimize the
+        original query, when you re-create the view, all the applications immediately take advantage of the
+        optimized base query.
+      </li>
+
+      <li>
+        To simplify a whole class of related queries, especially complicated queries involving joins between
+        multiple tables, complicated expressions in the column list, and other SQL syntax that makes the query
+        difficult to understand and debug. For example, you might create a view that joins several tables, filters
+        using several <codeph>WHERE</codeph> conditions, and selects several columns from the result set.
+        Applications might issue queries against this view that only vary in their <codeph>LIMIT</codeph>,
+        <codeph>ORDER BY</codeph>, and similar simple clauses.
+      </li>
+    </ul>
+
+    <p>
+      For queries that require repeating complicated clauses over and over again, for example in the select list,
+      <codeph>ORDER BY</codeph>, and <codeph>GROUP BY</codeph> clauses, you can use the <codeph>WITH</codeph>
+      clause as an alternative to creating a view.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+    <p conref="../shared/impala_common.xml#common/complex_types_views"/>
+    <p conref="../shared/impala_common.xml#common/complex_types_views_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<!-- TK: Elaborate on these, show queries and real output. -->
+
+<codeblock>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and c5 &gt; 0;
+
+-- Create a view that that reorders and renames columns from the underlying table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</codeblock>
+
+<!-- These examples show CREATE VIEW and corresponding DROP VIEW statements, with different combinations
+     of qualified and unqualified names. -->
+
+    <p conref="../shared/impala_common.xml#common/create_drop_view_examples"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_views.xml#views"/>, <xref href="impala_alter_view.xml#alter_view"/>,
+      <xref href="impala_drop_view.xml#drop_view"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_data_sources.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_data_sources.xml b/docs/topics/impala_data_sources.xml
new file mode 100644
index 0000000..8fb46e1
--- /dev/null
+++ b/docs/topics/impala_data_sources.xml
@@ -0,0 +1,22 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.4.0" id="data_sources">
+
+  <title>Data Sources</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <xref href="impala_create_data_source.xml#create_data_source"/>
+      <xref href="impala_drop_data_source.xml#drop_data_source"/>
+      <xref href="impala_create_table.xml#create_table"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_databases.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_databases.xml b/docs/topics/impala_databases.xml
new file mode 100644
index 0000000..3253e75
--- /dev/null
+++ b/docs/topics/impala_databases.xml
@@ -0,0 +1,65 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="databases">
+
+  <title>Overview of Impala Databases</title>
+  <titlealts audience="PDF"><navtitle>Databases</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      In Impala, a database is a logical container for a group of tables. Each database defines a separate
+      namespace. Within a database, you can refer to the tables inside it using their unqualified names. Different
+      databases can contain tables with identical names.
+    </p>
+
+    <p>
+      Creating a database is a lightweight operation. There are minimal database-specific properties to configure,
+      only <codeph>LOCATION</codeph> and <codeph>COMMENT</codeph>.  There is no <codeph>ALTER DATABASE</codeph> statement.
+    </p>
+
+    <p>
+      Typically, you create a separate database for each project or application, to avoid naming conflicts between
+      tables and to make clear which tables are related to each other. The <codeph>USE</codeph> statement lets
+      you switch between databases. Unqualified references to tables, views, and functions refer to objects
+      within the current database. You can also refer to objects in other databases by using qualified names
+      of the form <codeph><varname>dbname</varname>.<varname>object_name</varname></codeph>.
+    </p>
+
+    <p>
+      Each database is physically represented by a directory in HDFS. When you do not specify a <codeph>LOCATION</codeph>
+      attribute, the directory is located in the Impala data directory with the associated tables managed by Impala.
+      When you do specify a <codeph>LOCATION</codeph> attribute, any read and write operations for tables in that
+      database are relative to the specified HDFS directory.
+    </p>
+
+    <p>
+      There is a special database, named <codeph>default</codeph>, where you begin when you connect to Impala.
+      Tables created in <codeph>default</codeph> are physically located one level higher in HDFS than all the
+      user-created databases.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/builtins_db"/>
+
+    <p>
+      <b>Related statements:</b>
+    </p>
+
+    <p>
+      <xref href="impala_create_database.xml#create_database"/>,
+      <xref href="impala_drop_database.xml#drop_database"/>, <xref href="impala_use.xml#use"/>,
+      <xref href="impala_show.xml#show_databases"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_datatypes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_datatypes.xml b/docs/topics/impala_datatypes.xml
new file mode 100644
index 0000000..e45867e
--- /dev/null
+++ b/docs/topics/impala_datatypes.xml
@@ -0,0 +1,43 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="datatypes">
+
+  <title>Data Types</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">data types</indexterm>
+      Impala supports a set of data types that you can use for table columns, expression values, and function
+      arguments and return values.
+    </p>
+
+    <note>
+      Currently, Impala supports only scalar types, not composite or nested types. Accessing a table containing any
+      columns with unsupported types causes an error.
+    </note>
+
+    <p outputclass="toc"/>
+
+    <p>
+      For the notation to write literals of each of these data types, see
+      <xref href="impala_literals.xml#literals"/>.
+    </p>
+
+    <p>
+      See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> for differences between Impala and
+      Hive data types.
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_date.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml
new file mode 100644
index 0000000..ae17b9a
--- /dev/null
+++ b/docs/topics/impala_date.xml
@@ -0,0 +1,104 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept audience="Cloudera" id="date" rev="2.0.0">
+
+  <title>DATE Data Type (<keyword keyref="impala21"/> or higher only)</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Dates and Times"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DATE data type</indexterm>
+      A type representing the date (year, month, and day) as a single numeric value. Used to represent a broader
+      date range than possible with the <codeph>TIMESTAMP</codeph> type, with fewer distinct values than
+      <codeph>TIMESTAMP</codeph>, and in a more compact and efficient form than using a <codeph>STRING</codeph>
+      such as <codeph>'2014-12-31'</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>column_name</varname> DATE</codeblock>
+
+    <p>
+      <b>Range:</b> January 1, -4712 BC .. December 31, 9999 AD.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
+
+    <ul>
+      <li>
+        This type can be read from and written to Parquet files.
+      </li>
+
+      <li>
+        There is no requirement for a particular level of Parquet.
+      </li>
+
+      <li>
+        Parquet files generated by Impala and containing this type can be freely interchanged with other components
+        such as Hive and MapReduce.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+
+    <p>
+      TK.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/conversion_blurb"/>
+
+    <p>
+      TK.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/partitioning_blurb"/>
+
+    <p>
+      This type can be used for partition key columns. Because it has less granularity (and thus fewer distinct
+      values) than an equivalent <codeph>TIMESTAMP</codeph> column, and numeric columns are more efficient as
+      partition keys than strings, prefer to partition by a <codeph>DATE</codeph> column rather than a
+      <codeph>TIMESTAMP</codeph> column or a <codeph>STRING</codeph> representation of a date.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      This type is available on CDH 5.2 or higher.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/internals_2_bytes"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p>
+      Things happen when converting <codeph>TIMESTAMP</codeph> to <codeph>DATE</codeph> or <codeph>DATE</codeph> to
+      <codeph>TIMESTAMP</codeph>. TK.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      The <xref href="impala_timestamp.xml#timestamp">TIMESTAMP</xref> data type is closely related. Some functions
+      from <xref href="impala_datetime_functions.xml#datetime_functions"/> accept and return <codeph>DATE</codeph>
+      values.
+    </p>
+  </conbody>
+</concept>


Mime
View raw message