Kudu Snippets

- In CDH 5.8 / Impala 2.6 and higher, Impala recognizes the auth_to_local setting, + In and higher, Impala recognizes the auth_to_local setting, specified through the HDFS configuration setting hadoop.security.auth_to_local or the Cloudera Manager setting @@ -780,17 +780,6 @@ select concat('abc','mno','xyz'); -

SQL Language Reference Snippets @@ -873,7 +862,7 @@ select * from t2;

The Avro specification allows string values up to 2**64 bytes in length. Impala queries for Avro tables use 32-bit integers to hold string lengths. - In CDH 5.7 / Impala 2.5 and higher, Impala truncates CHAR + In and higher, Impala truncates CHAR and VARCHAR values in Avro tables to (2**31)-1 bytes. If a query encounters a STRING value longer than (2**31)-1 bytes in an Avro table, the query fails. In earlier releases, @@ -932,7 +921,7 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE

If you frequently run aggregate functions such as MIN(), MAX(), and COUNT(DISTINCT) on partition key columns, consider enabling the OPTIMIZE_PARTITION_KEY_SCANS - query option, which optimizes such queries. This feature is available in CDH 5.7 / Impala 2.5 and higher. + query option, which optimizes such queries. This feature is available in and higher. See for the kinds of queries that this option applies to, and slight differences in how partitions are evaluated when this query option is enabled. @@ -996,7 +985,7 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE

Likewise, the impala-shell command relies on - some information only available in Impala 2.3 / CDH 5.5 and higher + some information only available in and higher to prepare live progress reports and query summaries. The LIVE_PROGRESS and LIVE_SUMMARY query options have no effect when impala-shell connects @@ -1036,7 +1025,7 @@ drop database temp; use default; -- Before dropping a database, first drop all the tables inside it, --- or in CDH 5.5 and higher use the CASCADE clause. +-- or in and higher use the CASCADE clause. drop database temp; ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore: CAUSED BY: InvalidOperationException: Database temp is not empty @@ -1047,7 +1036,7 @@ show tables in temp; | t3 | +------+ --- CDH 5.5 and higher: +-- and higher: drop database temp cascade; -- CDH 5.4 and lower: @@ -1115,7 +1104,7 @@ drop database temp;

- In CDH 5.8 / Impala 2.6 and higher, Impala queries are optimized for files stored in Amazon S3. + In and higher, Impala queries are optimized for files stored in Amazon S3. For Impala tables that use the file formats Parquet, RCFile, SequenceFile, Avro, and uncompressed text, the setting fs.s3a.block.size in the core-site.xml configuration file determines @@ -1131,7 +1120,7 @@ drop database temp;

- In CDH 5.8 / Impala 2.6 and higher, Impala supports both queries (SELECT) + In and higher, Impala supports both queries (SELECT) and DML (INSERT, LOAD DATA, CREATE TABLE AS SELECT) for data residing on Amazon S3. With the inclusion of write support, @@ -1148,7 +1137,7 @@ drop database temp;

- In CDH 5.8 / Impala 2.6 and higher, Impala DDL statements such as + In and higher, Impala DDL statements such as CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION can create or remove folders as needed in the Amazon S3 system. Prior to CDH 5.8 / Impala 2.6, you had to create folders yourself and point @@ -1157,7 +1146,7 @@ drop database temp;

- In CDH 5.8 / Impala 2.6 and higher, the Impala DML statements (INSERT, LOAD DATA, + In and higher, the Impala DML statements (INSERT, LOAD DATA, and CREATE TABLE AS SELECT) can write data into a table or partition that resides in the Amazon Simple Storage Service (S3). The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and @@ -1227,7 +1216,7 @@ drop database temp;

- In CDH 5.7 / Impala 2.5 and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database. + In and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database. Java UDFs are also persisted, if they were created with the new CREATE FUNCTION syntax for Java UDFs, where the Java function argument and return types are omitted. Java-based UDFs created with the old CREATE FUNCTION syntax do not persist across restarts @@ -1235,7 +1224,7 @@ drop database temp; Until you re-create such Java UDFs using the new CREATE FUNCTION syntax, you must reload those Java-based UDFs by running the original CREATE FUNCTION statements again each time you restart the catalogd daemon. - Prior to CDH 5.7 / Impala 2.5, the requirement to reload functions after a restart applied to both C++ and Java functions. + Prior to the requirement to reload functions after a restart applied to both C++ and Java functions.

@@ -1317,7 +1306,7 @@ select c_first_name, c_last_name from customer where lower(trim(c_last_name)) rl

- In CDH 5.7 / Impala 2.5 and higher, you can simplify queries that + In and higher, you can simplify queries that use many UPPER() and LOWER() calls to do case-insensitive comparisons, by using the ILIKE or IREGEXP operators instead. See @@ -1857,11 +1846,11 @@ show functions in _impala_builtins like '*substring*'; Complex type considerations: Although you can create tables in this file format using the complex types (ARRAY, STRUCT, - and MAP) available in CDH 5.5 / Impala 2.3 and higher, + and MAP) available in and higher, currently, Impala can query these types only in Parquet tables. The one exception to the preceding rule is COUNT(*) queries on RCFile tables that include complex types. - Such queries are allowed in CDH 5.8 / Impala 2.6 and higher. + Such queries are allowed in and higher.

@@ -1906,7 +1895,7 @@ show functions in _impala_builtins like '*substring*';

The Impala complex types (STRUCT, ARRAY, or MAP) - are available in CDH 5.5 / Impala 2.3 and higher. + are available in and higher. To use these types with JDBC requires version 2.5.28 or higher of the Cloudera JDBC Connector for Impala. To use these types with ODBC requires version 2.5.30 or higher of the Cloudera ODBC Connector for Impala. Consider upgrading all JDBC and ODBC drivers at the same time you upgrade from CDH 5.5 or higher. @@ -2117,7 +2106,7 @@ order by r_name; The arguments to this command let you perform operations such as:

- cat: Print a file's contents to standard out. In CDH 5.5 and higher, you can use + cat: Print a file's contents to standard out. In CDH 5.5 and higher, you can use the -j option to output JSON.
@@ -2430,6 +2419,10 @@ flight_num: INT32 SNAPPY DO:83456393 FPO:83488603 SZ:10216514/11474301 HBase considerations:
+
+ The LOAD DATA statement cannot be used with HBase tables. +
+
HBase considerations: This data type is fully compatible with HBase tables.
@@ -2782,7 +2775,7 @@ select max(height), avg(height) from census_data where age > 20;

- In Impala 2.2 / CDH 5.4 and higher, the optional WITH REPLICATION clause + In and higher, the optional WITH REPLICATION clause for CREATE TABLE and ALTER TABLE lets you specify a replication factor, the number of hosts on which to cache the same data blocks. When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly @@ -2961,7 +2954,7 @@ Query finished, fetching results ...

- In CDH 5.8 / Impala 2.6 and higher, Impala can optionally + In and higher, Impala can optionally skip an arbitrary number of header lines from text input files on HDFS based on the skip.header.line.count value in the TBLPROPERTIES field of the table metadata. For example: @@ -3198,7 +3191,7 @@ sudo pip-python install ssl Prior to CDH 5.5 / Impala 2.3, the impala user was required to be a member of the hdfs group for the resource management feature to work (in combination with CDH 5 and the YARN and Llama components). - This requirement has been lifted in CDH 5.5 / Impala 2.3 and higher. The impala + This requirement has been lifted in and higher. The impala user remains in the hdfs group on upgraded systems if it was already there, but is no longer put into that group during new installs.
@@ -3673,6 +3666,25 @@ sudo pip-python install ssl

- In CDH 5.7 / Impala 2.5 and higher, you can specify these limits and thresholds for each + In and higher, you can specify these limits and thresholds for each pool rather than globally. That way, you can balance the resource usage and throughput between steady well-defined workloads, rare resource-intensive queries, and ad hoc exploratory queries. @@ -388,9 +388,9 @@ Although the following options are still present in the Cloudera Manager interface under the Admission Control configuration settings dialog, - Cloudera recommends you not use them in CDH 5.7 / Impala 2.5 and higher. + Cloudera recommends you not use them in and higher. These settings only apply if you enable admission control but leave dynamic resource pools disabled. - In CDH 5.7 / Impala 2.5 and higher, prefer to set up dynamic resource pools and + In and higher, prefer to set up dynamic resource pools and customize the settings for each pool, as described in and . @@ -441,7 +441,7 @@

Default: - -1, meaning unlimited (prior to CDH 5.7 / Impala 2.5, the default was 200) + -1, meaning unlimited (prior to the default was 200)