kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danburk...@apache.org
Subject [2/3] kudu git commit: Update Impala docs for Impala 2.8 release
Date Thu, 02 Feb 2017 00:23:31 GMT
Update Impala docs for Impala 2.8 release

* No longer needs to document the special 'IMPALA_KUDU' build.
* Fixed syntax for new style
* Added back CTAS, which has syntax (we'd mistakenly thought it was
  removed)
* Fixed a few typos/issues elsewhere (eg use of 'int32' instead of 'int'
  type)
* Removed the docs for composite range partitioning, which seems to
  no longer be supported in Impala.

Change-Id: Ia43d18e8d92c52e5868e1d48b91351bca41b53f8
Reviewed-on: http://gerrit.cloudera.org:8080/5733
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Jean-Daniel Cryans <jdcryans@apache.org>
Tested-by: Jean-Daniel Cryans <jdcryans@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/b30d68a9
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/b30d68a9
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/b30d68a9

Branch: refs/heads/master
Commit: b30d68a9bb8880edd43caaac9de34351571c8edb
Parents: 8d9fd9c
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Jan 18 12:07:20 2017 -0800
Committer: Jean-Daniel Cryans <jdcryans@apache.org>
Committed: Wed Feb 1 22:11:36 2017 +0000

----------------------------------------------------------------------
 docs/kudu_impala_integration.adoc | 532 +++++++++------------------------
 docs/quickstart.adoc              |  27 +-
 2 files changed, 156 insertions(+), 403 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/b30d68a9/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc
index db4d86f..9383124 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -27,304 +27,54 @@
 :sectlinks:
 :experimental:
 
-Kudu has tight integration with Impala, allowing you to use Impala
+Kudu has tight integration with Apache Impala (incubating), allowing you to use Impala
 to insert, query, update, and delete data from Kudu tablets using Impala's SQL
 syntax, as an alternative to using the link:installation.html#view_api[Kudu APIs]
 to build a custom Kudu application. In addition, you can use JDBC or ODBC to connect
 existing or new applications written in any language, framework, or business intelligence
 tool to your Kudu data, using Impala as the broker.
 
-NOTE: The following instructions assume a
-link:http://www.cloudera.com/content/www/en-us/products/cloudera-manager.html[Cloudera Manager]
-deployment. However, you can use Kudu with Impala without Cloudera Manager.
+== Requirements
 
-== Requirements and Implications
+* This documentation is specific to the certain versions of Impala. The syntax
+described will work only in the following releases:
+** The version of Impala 2.7.0 that ships with CDH 5.10. `SELECT VERSION()` will
+report `impalad version 2.7.0-cdh5.10.0`.
+** Apache Impala 2.8.0 releases compiled from source. `SELECT VERSION()` will
+report `impalad version 2.8.0`.
 
-This integration relies on features that released versions of Impala do not have yet.
-In the interim, you need
-to install a fork of Impala, which this document will refer to as _Impala_Kudu_.
+Older versions of Impala 2.7 (including the special `IMPALA_KUDU` releases
+previously available) have incompatible syntax. Future versions are likely to be
+compatible with this syntax, but we recommend checking that this is the latest
+available documentation corresponding to the appropriate version you have
+installed.
 
-* You can install Impala_Kudu using parcels or packages.
+* This documentation does not describe Impala installation procedures. Please
+refer to the Impala documentation and be sure that you are able to run simple
+queries against Impala tables on HDFS before proceeding.
 
-* Kudu itself requires CDH 5.4.3 or later. To use Cloudera Manager with Impala_Kudu,
-you need Cloudera Manager 5.4.3 or later. Cloudera Manager 5.4.7 is recommended, as
-it adds support for collecting metrics from Kudu.
+== Configuration
 
-* If you have an existing Impala instance on your cluster, you can install Impala_Kudu
-alongside the existing Impala instance *if you use parcels*. The new instance does
-not share configurations with the existing instance and is completely independent.
-A script is provided to automate this type of installation. See <<install_impala_kudu_parcels_side_by_side>>.
+No configuration changes are required within Kudu to enable access from Impala.
 
-* It is especially important that the cluster has adequate
-unreserved RAM for the Impala_Kudu instance.
+Although not strictly necessary, it is recommended to configure Impala with the
+locations of the Kudu Master servers:
 
-* Consider shutting down the original Impala service when testing Impala_Kudu if you
-want to be sure it is not impacted.
+* Set the `--kudu_master_hosts=<master1>[:port],<master2>[:port],<master3>[:port]`
+  flag in the Impala service configuration. If you are using Cloudera Manager,
+  please refer to the appropriate Cloudera Manager documentation to do so.
 
-* Before installing Impala_Kudu, you must have already installed and configured
-services for HDFS (though it is not used by Kudu), the Hive Metastore (where Impala
-stores its metadata), and link:installation.html[Kudu]. You may need HBase, YARN,
-Sentry, and ZooKeeper services as well. Meeting the Impala installation requirements
-is out of the scope of this document. See
-link:http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_prereqs.html[Impala
Prequisites]
-in the official Impala documentation for more information.
+If this flag is not set within the Impala service, it will be necessary to manually
+provide this configuration each time you create a table by specifying the
+`kudu_master_addresses` property inside a `TBLPROPERTIES` clause.
 
-
-== Installing Impala_Kudu Using Cloudera Manager
-
-If you use Cloudera Manager, you can install Impala_Kudu using
-<<install_impala_kudu_parcels,parcels>> or
-<<install_impala_kudu_packages,packages>>. However, if you have an existing Impala
-instance, you must use parcels and you should use the instructions provided in
-<<install_impala_kudu_parcels_side_by_side,procedure>>, rather than these instructions.
-
-[[install_impala_kudu_parcels]]
-=== Installing the Impala_Kudu Service Using Parcels
-
-[[install_impala_kudu_parcels_side_by_side]]
-==== Manual Installation
-
-NOTE: Manual installation of Impala_Kudu is only supported where there is no other Impala
-service already running in the cluster, and when you use parcels.
-
-. Obtain the Impala_Kudu parcel either by using the parcel repository or downloading it manually.
-  * To use the parcel repository:
-  ** Go to *Hosts / Parcels*.
-  ** Click *Edit Settings*. Add http://archive.cloudera.com/beta/impala-kudu/parcels/latest/
-      as a *Remote Parcel Repository URL*. Click *Save Changes*.
-  * To download the parcel manually:
-  ** Download the parcel for your operating system from
-    http://archive.cloudera.com/beta/impala-kudu/parcels/latest/ and upload
-    it to `/opt/cloudera/parcel-repo/` on the Cloudera Manager server.
-  ** Create a SHA1 file for the parcel. Cloudera Manager expects the SHA1 to be named
-    with the exact same name as the parcel, with a `.sha` ending added, and to only
-    contain the SHA1 itself, not the name of the parcel.
-+
-----
-sha1sum <name_of_parcel_file> | awk {'print $1'} > <name_of_parcel_file>.sha
-----
-+
-. Go to *Hosts / Parcels*. Click *Check for New Parcels.* Verify that *Impala_Kudu*
-is in the list.
-. Download (if necessary), distribute, and activate the *Impala_Kudu* parcel.
-. Add a new Impala service. This service will use the Impala_Kudu parcel.
-  * Go to the cluster and click *Actions / Add a Service*.
-  * Choose one host to run the Catalog Server, one to run the StateServer, and one
-or more to run Impala Daemon instances. Click *Continue*.
-  * Choose one or more Impala scratch directories. Click *Continue*. The Impala service
-  starts. *However, the features that Impala needs in order to work with Kudu are not
-  enabled yet.*
-. Enable the features that allow Impala to work with Kudu.
-  * Go to the new Impala service. Click *Configuration*.
-  * Search for the *Impala Service Environment Advanced Configuration Snippet (Safety
-  Valve)* configuration item. Add the following to the text field and save your changes:
-  `IMPALA_KUDU=1`
-  * Restart the Impala service.
-  * You can verify that the Kudu features are available to Impala by running the following
-  query in Impala Shell:
-+
-[source,sql]
-----
-select if(version() like '%KUDU%', "all set to go!", "check your configs") as s;
-
-Query: select if(version() like '%KUDU%', "all set to go!", "check your configs") as s
-+----------------+
-| s              |
-+----------------+
-| all set to go! |
-+----------------+
-Fetched 1 row(s) in 0.02s
-----
-+
-If you do not 'all set to go!', carefully review the previous instructions to be sure
-that you have not missed a step.
-
-
-==== Installation using the `deploy.py` Script
-
-If you use parcels, Cloudera recommends using the included `deploy.py` script to
-install and deploy the Impala_Kudu service into your cluster. If your cluster does
-not have an existing Impala instance, the script is optional. However, if you do
-have an existing Impala instance and want to install Impala_Kudu side-by-side,
-you must use the script.
-
-.Prerequisites
-* The script depends upon the Cloudera Manager API Python bindings. Install the bindings
-using `sudo pip install cm-api` (or as an unprivileged user, with the `--user`
-option to `pip`), or see http://cloudera.github.io/cm_api/docs/python-client/
-for more details.
-* You need the following information to run the script:
-** The IP address or fully-qualified domain name of the Cloudera Manager server.
-** The IP address or fully-qualified domain name of the host that should run the Kudu
-master process, if different from the Cloudera Manager server.
-** The cluster name, if Cloudera Manager manages multiple clusters.
-** If you have an existing Impala service and want to clone its configuration, you
-  need to know the name of the existing service.
-** If your cluster has more than one instance of a HDFS, Hive, HBase, or other CDH
-  service that this Impala_Kudu service depends upon, the name of the service this new
-  Impala_Kudu service should use.
-** A name for the new Impala service.
-** A user name and password with *Full Administrator* privileges in Cloudera Manager.
-** The IP address or host name of the host where the new Impala_Kudu service's master role
-  should be deployed, if not the Cloudera Manager server.
-** A comma-separated list of local (not HDFS) scratch directories which the new
-Impala_Kudu service should use, if you are not cloning an existing Impala service.
-* Your Cloudera Manager server needs network access to reach the parcel repository
-hosted on `cloudera.com`.
-
-.Procedure
-
-- Download the `deploy.py` from https://github.com/apache/incubator-impala/blob/master/infra/deploy/deploy.py
-using `curl` or another utility of your choice.
-+
-[source,bash]
-----
-$ curl -O https://raw.githubusercontent.com/apache/incubator-impala/master/infra/deploy/deploy.py
-----
-+
-- Run the `deploy.py` script. The syntax below creates a standalone IMPALA_KUDU
-service called `IMPALA_KUDU-1` on a cluster called `Cluster 1`. Exactly one HDFS, Hive,
-and HBase service exist in Cluster 1, so service dependencies are not required.
-The cluster should not already have an Impala instance.
-+
-[source,bash]
-----
-$ python deploy.py create IMPALA_KUDU-1 --cluster 'Cluster 1' \
-  --master_host <FQDN_of_Kudu_master_server> \
-  --host <FQDN_of_cloudera_manager_server>
-----
-
-NOTE: If you do not specify `--master_host`, the Kudu master is configured to run
-on the Cloudera Manager server (the value specified by the `--host` parameter).
-
-- If two HDFS services are available, called `HDFS-1` and `HDFS-2`, use the following
-syntax to create the same `IMPALA_KUDU-1` service using `HDFS-2`. You can specify
-multiple types of dependencies; use the `deploy.py create -h` command for details.
-+
-[source,bash]
-----
-$ python deploy.py create IMPALA_KUDU-1 --cluster 'Cluster 1' --hdfs_dependency HDFS-2 \
-  --host <FQDN_of_cloudera_manager_server>
-----
-
-- Run the `deploy.py` script with the following syntax to clone an existing IMPALA
-service called `IMPALA-1` to a new IMPALA_KUDU service called `IMPALA_KUDU-1`, where
-Cloudera Manager only manages a single cluster.  This new `IMPALA_KUDU-1` service
-can run side by side with the `IMPALA-1` service if there is sufficient RAM for both.
-`IMPALA_KUDU-1` should be given at least 16 GB of RAM and possibly more depending
-on the complexity of the workload and the query concurrency level.
-+
-[source,bash]
-----
-$ python deploy.py clone IMPALA_KUDU-1 IMPALA-1 --host <FQDN_of_cloudera_manager_server>
-----
-
-- Additional parameters are available for `deploy.py`. To view them, use the `-h`
-argument.  You can also use commands such as `deploy.py create -h` or
-`deploy.py clone -h` to get information about additional arguments for individual operations.
-
-- The service is created *but not started*. Review the configuration in Cloudera Manager
-and start the service.
-
-[[install_impala_kudu_packages]]
-=== Installing Impala_Kudu Using Packages
-
-Before installing Impala_Kudu packages, you need to uninstall any existing Impala
-packages, using operating system utilities. For this reason, you cannot use Impala_Kudu
-alongside another Impala instance if you use packages.
-
-[[impala_kudu_package_locations]]
-.Impala_Kudu Package Locations
-[cols=">s,<,<",options="header"]
-|===
-| OS  | Repository  | Individual Packages
-| RHEL or CentOS | link:http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/cloudera-impala-kudu.repo[RHEL
6 or CentOS 6],
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/7/x86_64/impala-kudu/cloudera-impala-kudu.repo[RHEL
7 or CentOS 7] |
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/0/RPMS/x86_64/[RHEL
6 or CentOS 6],
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/7/x86_64/impala-kudu/0/RPMS/x86_64/[RHEL
7 or CentOS 7]
-| Ubuntu | link:http://archive.cloudera.com/beta/impala-kudu/ubuntu/trusty/amd64/impala-kudu/cloudera.list[Trusty]
|  http://archive.cloudera.com/beta/impala-kudu/ubuntu/trusty/amd64/impala-kudu/pool/contrib/i/impala-kudu/[Trusty]
-|===
-
-. Download and configure the Impala_Kudu repositories for your operating system, or manually
-download individual RPMs, the appropriate link from <<impala_kudu_package_locations>>.
-
-. An Impala cluster has at least one `impala-kudu-server` and at most one `impala-kudu-catalog`
-and `impala-kudu-state-store`.  To connect to Impala from the command line, install
-the `impala-kudu-shell` package.
-
-=== Adding Impala service in Cloudera Manager
-. Add a new Impala service in Cloudera Manager.
-** Go to the cluster and click *Actions / Add a Service*.
-** Choose one host to run the Catalog Server, one to run the Statestore, and at
-  least three to run Impala Daemon instances. Click *Continue*.
-** Choose one or more Impala scratch directories. Click *Continue*.
-. The Impala service starts.
-
-== Installing Impala_Kudu Without Cloudera Manager
-
-Before installing Impala_Kudu packages, you need to uninstall any existing Impala
-packages, using operating system utilities. For this reason, you cannot use Impala_Kudu
-alongside another Impala instance if you use packages.
-
-IMPORTANT: Do not use these command-line instructions if you use Cloudera Manager.
-Instead, follow <<install_impala_kudu_packages>>.
-
-[[impala_kudu_non-cm_locations]]
-.Impala_Kudu Package Locations
-[cols=">s,<,<",options="header"]
-|===
-| OS  | Repository  | Individual Packages
-| RHEL or CentOS | link:http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/cloudera-impala-kudu.repo[RHEL
6 or CentOS 6],
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/7/x86_64/impala-kudu/cloudera-impala-kudu.repo[RHEL
7 or CentOS 7] |
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/0/RPMS/x86_64/[RHEL
6 or CentOS 6],
-                   link:http://archive.cloudera.com/beta/impala-kudu/redhat/7/x86_64/impala-kudu/0/RPMS/x86_64/[RHEL
7 or CentOS 7]
-| Ubuntu | link:http://archive.cloudera.com/beta/impala-kudu/ubuntu/trusty/amd64/impala-kudu/cloudera.list[Trusty]
|  http://archive.cloudera.com/beta/impala-kudu/ubuntu/trusty/amd64/impala-kudu/pool/contrib/i/impala-kudu/[Trusty]
-|===
-
-. Download and configure the Impala_Kudu repositories for your operating system, or manually
-download individual RPMs, the appropriate link from <<impala_kudu_non-cm_locations>>.
-
-. An Impala cluster has at least one `impala-kudu-server` and at most one `impala-kudu-catalog`
-and `impala-kudu-state-store`.  To connect to Impala from the command line, install
-the `impala-kudu-shell` package.
-
-=== Starting Impala_Kudu Services
-. Use the Impala start-up scripts to start each service on the relevant hosts:
-+
-----
-$ sudo service impala-state-store start
-
-$ sudo service impala-catalog start
-
-$ sudo service impala-server start
-----
+The rest of this guide assumes that the configuration has been set.
 
 == Using the Impala Shell
 
 NOTE: This is only a small sub-set of Impala Shell functionality. For more details, see the
 link:http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_impala_shell.html[Impala
Shell] documentation.
 
-Neither Kudu nor Impala need special configuration in order for you to use the Impala
-Shell or the Impala API to insert, update, delete, or query Kudu data using Impala.
-However, you do need to create a mapping between the Impala and Kudu tables. Kudu
-provides the Impala query to map to an existing Kudu table in the web UI.
-
-- Be sure you are using the `impala-shell` binary provided by the Impala_Kudu package,
-rather than the default CDH Impala binary. The following shows how to verify this
-using the `alternatives` command on a RHEL or CentOS host.
-+
-[source,bash]
-----
-$ sudo alternatives --display impala-shell
-
-impala-shell - status is auto.
- link currently points to /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.1007/bin/impala-shell
-/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.1007/bin/impala-shell - priority 10
-/opt/cloudera/parcels/IMPALA_KUDU-2.3.0-1.cdh5.5.0.p0.119/bin/impala-shell - priority 5
-Current `best' version is /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.1007/bin/impala-shell.
-
-$ sudo alternatives --set impala-shell /opt/cloudera/parcels/IMPALA_KUDU-2.3.0-1.cdh5.5.0.p0.119/bin/impala-shell
-----
 - Start Impala Shell using the `impala-shell` command. By default, `impala-shell`
 attempts to connect to the Impala daemon on `localhost` on port 21000. To connect
 to a different host,, use the `-i <host:port>` option. To automatically connect to
@@ -351,17 +101,26 @@ http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/imp
 for more information about internal and external tables.
 
 === Querying an Existing Kudu Table In Impala
-. Go to http://kudu-master.example.com:8051/tables/, where _kudu-master.example.com_
-is the address of your Kudu master.
-. Click the table ID for the relevant table.
-. Scroll to the bottom of the page, or search for `Impala CREATE TABLE statement`.
-Copy the entire statement.
-. Paste the statement into Impala. Impala now has a mapping to your Kudu table.
+
+Tables created through the Kudu API or other integrations such as Apache Spark
+are not automatically visible in Impala. To query them, you must first create
+an external table within Impala to map the Kudu table into an Impala database:
+
+[source,sql]
+----
+CREATE EXTERNAL TABLE my_mapping_table
+STORED AS KUDU
+TBLPROPERTIES (
+  'kudu.table_name' = 'my_kudu_table'
+);
+----
 
 [[kudu_impala_create_table]]
 === Creating a New Kudu Table From Impala
 Creating a new table in Kudu from Impala is similar to mapping an existing Kudu table
-to an Impala table, except that you need to write the `CREATE` statement yourself.
+to an Impala table, except that you need to specify the schema and partitioning
+information yourself.
+
 Use the following example as a guideline. Impala first creates the table, then creates
 the mapping.
 
@@ -373,46 +132,33 @@ CREATE TABLE my_first_table
   name STRING,
   PRIMARY KEY(id)
 )
-DISTRIBUTE BY HASH INTO 16 BUCKETS
-STORED AS KUDU
-TBLPROPERTIES(
-  'kudu.master_addresses' = 'kudu-master.example.com:7051',
-);
+PARTITION BY HASH PARTITIONS 16
+STORED AS KUDU;
 ----
 
 In the `CREATE TABLE` statement, the columns that comprise the primary key must
 be listed first. Additionally, primary key columns are implicitly marked `NOT NULL`.
 
-The following table property is required unless the `kudu_master_hosts` configuration
-flag has been specified for Impala:
-
-`kudu.master_addresses`:: the list of Kudu masters Impala should communicate with.
-
 When creating a new Kudu table, you are required to specify a distribution scheme.
 See <<partitioning_tables>>. The table creation example above is distributed
into
-16 buckets by hashing the `id` column, for simplicity. See
+16 partitions by hashing the `id` column, for simplicity. See
 <<partitioning_rules_of_thumb>> for guidelines on partitioning.
 
 ==== `CREATE TABLE AS SELECT`
 
-`CREATE TABLE AS SELECT` does not offer syntax to specify a primary key, and
-thus cannot be used to create a table in Kudu from Impala.
-
-Instead, a `CREATE TABLE` statement may be issued, followed by an `INSERT ... SELECT`
-statement. for example:
+You can create a table by querying any other table or tables in Impala, using a `CREATE
+TABLE ... AS SELECT` statement. The following example imports all rows from an existing table
+`old_table` into a Kudu table `new_table`. The names and types of columns in `new_table`
+will determined from the columns in the result set of the `SELECT` statement. Note that you
must
+additionally specify the primary key and partitioning.
 
 [source,sql]
 ----
 CREATE TABLE new_table
-(
-  ts BIGINT,
-  name STRING,
-  value DOUBLE,
-  PRIMARY KEY (ts, name)
-)
-DISTRIBUTE BY HASH(name) INTO 8 BUCKETS
-STORED AS KUDU;
-INSERT INTO new_table SELECT ts, name, value FROM old_table;
+PRIMARY KEY (ts, name)
+PARTITION BY HASH(name) PARTITIONS 8
+STORED AS KUDU
+AS SELECT ts, name, value FROM old_table;
 ----
 
 ==== Specifying Tablet Partitioning
@@ -423,7 +169,7 @@ has no mechanism for automatically (or manually) splitting a pre-existing
tablet
 Until this feature has been implemented, *you must specify your partitioning when
 creating a table*. When designing your table schema, consider primary keys that will allow
you to
 split your table into partitions which grow at similar rates. You can designate
-partitions using a `DISTRIBUTE BY` clause when creating a table using Impala:
+partitions using a `PARTITION BY` clause when creating a table using Impala:
 
 NOTE: Impala keywords, such as `group`, are enclosed by back-tick characters when
 they are not used in their keyword sense.
@@ -431,7 +177,7 @@ they are not used in their keyword sense.
 [source,sql]
 ----
 CREATE TABLE cust_behavior (
-  _id BIGINT,
+  _id BIGINT PRIMARY KEY,
   salary STRING,
   edu_level INT,
   usergender STRING,
@@ -442,10 +188,10 @@ CREATE TABLE cust_behavior (
   last_purchase_date BIGINT,
   category STRING,
   sku STRING,
-  rating INT,af4165ef333822510b4fe13cb2a2d324e37ec786
+  rating INT,
   fulfilled_date BIGINT
 )
-DISTRIBUTE BY RANGE (_id)
+PARTITION BY RANGE (_id)
 (
     PARTITION VALUES < 1439560049342,
     PARTITION 1439560049342 <= VALUES < 1439566253755,
@@ -504,20 +250,17 @@ been created. You must provide a partition schema for your table when
you create
 When designing your tables, consider using primary keys that will allow you to partition
 your table into tablets which grow at similar rates.
 
-You can partition your table using Impala's `DISTRIBUTE BY` keyword, which
+You can partition your table using Impala's `PARTITION BY` keyword, which
 supports distribution by `RANGE` or `HASH`. The partition scheme can contain zero
 or more `HASH` definitions, followed by an optional `RANGE` definition. The `RANGE`
 definition can refer to one or more primary key columns.
 Examples of <<basic_partitioning,basic>> and <<advanced_partitioning, advanced>>
 partitioning are shown below.
 
-NOTE: Impala keywords, such as `group`, are enclosed by back-tick characters when
-they are used as identifiers, rather than as keywords.
-
 [[basic_partitioning]]
 ==== Basic Partitioning
 
-.`DISTRIBUTE BY RANGE`
+.`PARTITION BY RANGE`
 You can specify range partitions for one or more primary key columns.
 Range partitioning in Kudu allows splitting a table based based on
 specific values or ranges of values of the chosen partition keys. This allows
@@ -541,23 +284,23 @@ addition to, `RANGE`.
 CREATE TABLE customers (
   state STRING,
   name STRING,
-  purchase_count int32,
+  purchase_count int,
   PRIMARY KEY (state, name)
 )
-DISTRIBUTE BY RANGE (state)
+PARTITION BY RANGE (state)
 (
-  VALUES = 'al',
-  VALUES = 'ak',
-  VALUES = 'ar',
-  ...
-  VALUES = 'wv',
-  VALUES = 'wy'
+  PARTITION VALUE = 'al',
+  PARTITION VALUE = 'ak',
+  PARTITION VALUE = 'ar',
+  -- ... etc ...
+  PARTITION VALUE = 'wv',
+  PARTITION VALUE = 'wy'
 )
 STORED AS KUDU;
 ----
 
 [[distribute_by_hash]]
-.`DISTRIBUTE BY HASH`
+.`PARTITION BY HASH`
 
 Instead of distributing by an explicit range, or in combination with range distribution,
 you can distribute into a specific number of 'buckets' by hash. You specify the primary
@@ -573,7 +316,7 @@ definitions. Consider two columns, `a` and `b`:
 * icon:check[pro, role="green"] `HASH(a,b)`
 * icon:times[pro, role="red"] `HASH(a), HASH(a,b)`
 
-NOTE: `DISTRIBUTE BY HASH` with no column specified is a shortcut to create the desired
+NOTE: `PARTITION BY HASH` with no column specified is a shortcut to create the desired
 number of buckets by hashing all primary key columns.
 
 Hash partitioning is a reasonable approach if primary key values are evenly
@@ -603,7 +346,7 @@ CREATE TABLE cust_behavior (
   fulfilled_date BIGINT,
   PRIMARY KEY (id, sku)
 )
-DISTRIBUTE BY HASH INTO 16 BUCKETS
+PARTITION BY HASH PARTITIONS 16
 STORED AS KUDU;
 ----
 
@@ -617,37 +360,7 @@ Each definition can encompass one or more columns. While enumerating
every possi
 schema is out of the scope of this document, a few examples illustrate some of the
 possibilities.
 
-.`DISTRIBUTE BY RANGE` Using Composite Partition Keys
-
-This example creates 100 tablets, two for each US state. Per state, the first tablet
-holds names starting with characters before 'm', and the second tablet holds names
-starting with 'm'-'z'. Writes are spread across at least 50 tablets, and possibly
-up to 100. A query for a range of names in a given state is likely to only need to read from
-one tablet, while a query for a range of names across every state will likely
-read from at most 50 tablets.
-
-[source,sql]
-----
-CREATE TABLE customers (
-  state STRING,
-  name STRING,
-  purchase_count int32,
-  PRIMARY KEY (state, name)
-)
-DISTRIBUTE BY RANGE (state, name)
-(
-  PARTITION ('al', '')  <= VALUES < ('al', 'm'),
-  PARTITION ('al', 'm') <= VALUES < ('ak', '')
-  PARTITION ('ak', '')  <= VALUES < ('ak', 'm'),
-  PARTITION ('ak', 'm') <= VALUES < ('ar', ''),
-  ...
-  PARTITION ('wy', '')  <= VALUES < ('wy', 'm'),
-  PARTITION ('wy', 'm') <= VALUES
-)
-STORED AS KUDU;
-----
-
-==== `DISTRIBUTE BY HASH` and `RANGE`
+==== `PARTITION BY HASH` and `RANGE`
 
 Consider the <<distribute_by_hash,simple hashing>> example above, If you often
query for a range of `sku`
 values, you can optimize the example by combining hash partitioning with range partitioning.
@@ -658,8 +371,8 @@ based upon the value of the `sku` string. Writes are spread across at
least four
 (and possibly up to 16). When you query for a contiguous range of `sku` values, you have
a
 good chance of only needing to read from a quarter of the tablets to fulfill the query.
 
-NOTE: By default, the entire primary key is hashed when you use `DISTRIBUTE BY HASH`.
-To hash on only part of the primary key, specify it by using syntax like `DISTRIBUTE
+NOTE: By default, the entire primary key is hashed when you use `PARTITION BY HASH`.
+To hash on only part of the primary key, specify it by using syntax like `PARTITION
 BY HASH (id, sku)`.
 
 [source,sql]
@@ -680,7 +393,7 @@ CREATE TABLE cust_behavior (
   fulfilled_date BIGINT,
   PRIMARY KEY (id, sku)
 )
-DISTRIBUTE BY HASH (id) INTO 4 BUCKETS,
+PARTITION BY HASH (id) PARTITIONS 4,
 RANGE (sku)
 (
   PARTITION VALUES < 'g',
@@ -691,7 +404,7 @@ RANGE (sku)
 STORED AS KUDU;
 ----
 
-.Multiple `DISTRIBUTE BY HASH` Definitions
+.Multiple `PARTITION BY HASH` Definitions
 Again expanding the example above, suppose that the query pattern will be unpredictable,
 but you want to ensure that writes are spread across a large number of tablets
 You can achieve maximum distribution across the entire primary key by hashing on
@@ -715,13 +428,13 @@ CREATE TABLE cust_behavior (
   fulfilled_date BIGINT,
   PRIMARY KEY (id, sku)
 )
-DISTRIBUTE BY HASH (id) INTO 4 BUCKETS,
-              HASH (sku) INTO 4 BUCKETS
+PARTITION BY HASH (id) PARTITIONS 4,
+             HASH (sku) PARTITIONS 4
 STORED AS KUDU;
 ----
 
-The example creates 16 buckets. You could also use `HASH (id, sku) INTO 16 BUCKETS`.
-However, a scan for `sku` values would almost always impact all 16 buckets, rather
+The example creates 16 partitions. You could also use `HASH (id, sku) PARTITIONS 16`.
+However, a scan for `sku` values would almost always impact all 16 partitions, rather
 than possibly being limited to 4.
 
 .Non-Covering Range Partitions
@@ -748,21 +461,40 @@ ranges will be rejected.
 
 [source,sql]
 ----
-CREATE TABLE sales_by_year (year INT32, sale_id INT32, amount INT32)
-PRIMARY KEY (sale_id, year)
-DISTRIBUTE BY RANGE (year)
-(
-  PARTITION VALUES = 2012,
-  PARTITION VALUES = 2013,
-  PARTITION VALUES = 2014,
-  PARTITION VALUES = 2015,
-  PARTITION VALUES = 2016
-);
+CREATE TABLE sales_by_year (
+  year INT, sale_id INT, amount INT,
+  PRIMARY KEY (sale_id, year)
+)
+PARTITION BY RANGE (year) (
+  PARTITION VALUE = 2012,
+  PARTITION VALUE = 2013,
+  PARTITION VALUE = 2014,
+  PARTITION VALUE = 2015,
+  PARTITION VALUE = 2016
+)
+STORED AS KUDU;
 ----
 
 When records start coming in for 2017, they will be rejected. At that point, the `2017`
-range should be added. Impala 2.8 and higher support this functionality using
-the `ALTER TABLE [ADD|DROP] RANGE PARTITION` statements.
+range should be added as follows:
+
+[source,sql]
+----
+ALTER TABLE sales_by_year ADD RANGE PARTITION VALUE = 2017;
+----
+
+In use cases where a rolling window of data retention is required, range partitions
+may also be dropped. For example, if data from 2012 should no longer be retained,
+it may be deleted in bulk:
+
+[source,sql]
+----
+ALTER TABLE sales_by_year DROP RANGE PARTITION VALUE = 2012;
+----
+
+Note that, just like dropping a table, this irrecoverably deletes all data
+stored in the dropped partition.
+
 
 [[partitioning_rules_of_thumb]]
 ==== Partitioning Rules of Thumb
@@ -924,16 +656,40 @@ You can change Impala's metadata relating to a given Kudu table by altering
the
 properties. These properties include the table name, the list of Kudu master addresses,
 and whether the table is managed by Impala (internal) or externally.
 
-IMPORTANT: Altering table properties only changes Impala's metadata about the table,
-not the underlying table itself. These statements do not modify any table metadata
-in Kudu.
 
-.Rename a Table
+.Rename an Impala Mapping Table
 [source,sql]
 ----
 ALTER TABLE my_table RENAME TO my_new_table;
 ----
 
+NOTE: Renaming a table using the `ALTER TABLE ... RENAME` statement only renames
+the Impala mapping table, regardless of whether the table is an internal or external
+table. This avoids disruption to other applications that may be accessing the
+underlying Kudu table.
+
+.Rename the underlying Kudu table for an internal table
+
+If a table is an internal table, the underlying Kudu table may be renamed by
+changing the `kudu.table_name` property:
+
+[source,sql]
+----
+ALTER TABLE my_internal_table
+SET TBLPROPERTIES('kudu.table_name' = 'new_name')
+----
+
+.Remapping an external table to a different Kudu table
+
+If another application has renamed a Kudu table under Impala, it is possible to
+re-map an external table to point to a different Kudu table name.
+
+[source,sql]
+----
+ALTER TABLE my_external_table_
+SET TBLPROPERTIES('kudu.table_name' = 'some_other_kudu_table')
+----
+
 .Change the Kudu Master Address
 [source,sql]
 ----
@@ -981,9 +737,9 @@ The examples above have only explored a fraction of what you can do with
Impala
   primary key columns before other columns, in primary key order.
 - Kudu tables containing `UNIXTIME_MICROS`-typed columns may not be used as an
   external table in Impala.
-- Impala can not create Kudu tables with `TIMESTAMP` or nested-typed columns.
+- Impala can not create Kudu tables with `TIMESTAMP`, `DECIMAL`, `VARCHAR`,
+  or nested-typed columns.
 - Impala can not update values in primary key columns.
 - `NULL`, `NOT NULL`, `!=`, and `LIKE` predicates are not pushed to Kudu, and
-  instead will be evaluated by the Impala scan node.
-- Impala can not create Kudu tables with bounded range partitions, and can not
-  alter a table to add or remove range partitions.
+  instead will be evaluated by the Impala scan node. This may decrease performance
+  relative to other types of predicates.

http://git-wip-us.apache.org/repos/asf/kudu/blob/b30d68a9/docs/quickstart.adoc
----------------------------------------------------------------------
diff --git a/docs/quickstart.adoc b/docs/quickstart.adoc
index 86c64d6..0ed8245 100644
--- a/docs/quickstart.adoc
+++ b/docs/quickstart.adoc
@@ -153,20 +153,12 @@ storage.
 +
 [source,sql]
 ----
-CREATE TABLE sfmta (
-  report_time BIGINT NOT NULL,
-  vehicle_tag STRING NOT NULL,
-  longitude FLOAT NOT NULL,
-  latitude FLOAT NOT NULL,
-  speed FLOAT NOT NULL,
-  heading FLOAT NOT NULL,
-  PRIMARY KEY (report_time, vehicle_tag)
-)
-DISTRIBUTE BY HASH(report_time) INTO 8 BUCKETS
-STORED AS KUDU;
-
-INSERT INTO sfmta SELECT
-  UNIX_TIMESTAMP(report_time,  'MM/dd/yyyy HH:mm:ss'),
+CREATE TABLE sfmta
+PRIMARY KEY (report_time, vehicle_tag)
+PARTITION BY HASH(report_time) PARTITIONS 8
+STORED AS KUDU
+AS SELECT
+  UNIX_TIMESTAMP(report_time,  'MM/dd/yyyy HH:mm:ss') AS report_time,
   vehicle_tag,
   longitude,
   latitude,
@@ -174,7 +166,12 @@ INSERT INTO sfmta SELECT
   heading
 FROM sfmta_raw;
 
--- Modified 859086 row(s), 0 row error(s) in 8.55s
++------------------------+
+| summary                |
++------------------------+
+| Inserted 859086 row(s) |
++------------------------+
+Fetched 1 row(s) in 5.75s
 ----
 +
 The created table uses a composite primary key. See


Mime
View raw message