Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 62EBC200CB5 for ; Wed, 12 Jul 2017 09:18:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5CC5D166CCF; Wed, 12 Jul 2017 07:18:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D06E4166C9F for ; Wed, 12 Jul 2017 09:18:31 +0200 (CEST) Received: (qmail 97473 invoked by uid 500); 12 Jul 2017 07:18:31 -0000 Mailing-List: contact commits-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list commits@impala.incubator.apache.org Received: (qmail 97464 invoked by uid 99); 12 Jul 2017 07:18:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2017 07:18:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5D479195441 for ; Wed, 12 Jul 2017 07:18:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.712 X-Spam-Level: X-Spam-Status: No, score=-3.712 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id aqCmfFFfvcr4 for ; Wed, 12 Jul 2017 07:18:24 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id B58CD627C2 for ; Wed, 12 Jul 2017 07:17:47 +0000 (UTC) Received: (qmail 97168 invoked by uid 99); 12 Jul 2017 07:17:47 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2017 07:17:47 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id B613FE96A8; Wed, 12 Jul 2017 07:17:46 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: jrussell@apache.org To: commits@impala.incubator.apache.org Date: Wed, 12 Jul 2017 07:17:50 -0000 Message-Id: <72c04bd7b37940df9cf256d0d33801b7@git.apache.org> In-Reply-To: <3cccc765403a41298af377e72e24aff3@git.apache.org> References: <3cccc765403a41298af377e72e24aff3@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [5/6] incubator-impala git commit: Add Impala 2.9 docs from master branch, with commit hash f1a3d8e14dae4948ce77e2f85e036d83f2d8b246 archived-at: Wed, 12 Jul 2017 07:18:33 -0000 http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_auditing.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_auditing.html b/docs/build/html/topics/impala_auditing.html index bcd6d9f..eb6f450 100644 --- a/docs/build/html/topics/impala_auditing.html +++ b/docs/build/html/topics/impala_auditing.html @@ -25,17 +25,27 @@
  • - Decide how many queries will be represented in each log file. By default, - Impala starts a new log file every 5000 queries. To specify a different number, + Decide how many queries will be represented in each audit event log file. By default, + Impala starts a new audit event log file every 5000 queries. To specify a different number, include - the option -max_audit_event_log_file_size=number_of_queries + the option --max_audit_event_log_file_size=number_of_queries in the impalad startup options.
  • -
  • +
  • + In Impala 2.9 and higher, you can control how many + audit event log files are kept on each host. Specify the option + --max_audit_event_log_files=number_of_log_files + in the impalad startup options. Once the limit is reached, older + files are rotated out using the same mechanism as for other Impala log files. + The default value for this setting is 0, representing an unlimited number of audit + event log files. +
  • + +
  • Use a cluster manager with governance capabilities to filter, visualize, and produce reports based on the audit logs collected - from all the hosts in the cluster. + from all the hosts in the cluster.
  • @@ -61,18 +71,18 @@ fsync() system call) to avoid loss of audit data in case of a crash.

    -

    +

    The runtime overhead of auditing applies to whichever host serves as the coordinator for the query, that is, the host you connect to when you issue the query. This might be the same host for all queries, or different applications or users might connect to - and issue queries through different hosts. + and issue queries through different hosts.

    -

    +

    To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log data (using the fsync() system call) periodically rather than after every query. Currently, the fsync() calls are issued at a fixed - interval, every 5 seconds. + interval, every 5 seconds.

    @@ -92,12 +102,12 @@

    -

    +

    The audit log files represent the query information in JSON format, one query per line. Typically, rather than looking at the log files themselves, you should use cluster-management software to consolidate the log data from all Impala hosts and filter and visualize the results in useful ways. (If you do examine the raw log data, you might run the files through - a JSON pretty-printer first.) + a JSON pretty-printer first.)

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_authorization.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_authorization.html b/docs/build/html/topics/impala_authorization.html index 13a8fb4..6b66523 100644 --- a/docs/build/html/topics/impala_authorization.html +++ b/docs/build/html/topics/impala_authorization.html @@ -682,8 +682,7 @@ sales = hdfs://ha-nn-uri/etc/access/sales.ini

    To enable URIs in per-DB policy files, the Java configuration option sentry.allow.uri.db.policyfile - must be set to true. - For example: + must be set to true. For example:

    JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
    
    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_char.html
    ----------------------------------------------------------------------
    diff --git a/docs/build/html/topics/impala_char.html b/docs/build/html/topics/impala_char.html
    index e0b4cb9..62ab8ef 100644
    --- a/docs/build/html/topics/impala_char.html
    +++ b/docs/build/html/topics/impala_char.html
    @@ -240,7 +240,7 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c fr
             Kudu considerations:
           

    - Currently, the data types DECIMAL, TIMESTAMP, CHAR, VARCHAR, + Currently, the data types DECIMAL, CHAR, VARCHAR, ARRAY, MAP, and STRUCT cannot be used with Kudu tables.

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_components.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_components.html b/docs/build/html/topics/impala_components.html index d3d210d..c7245c3 100644 --- a/docs/build/html/topics/impala_components.html +++ b/docs/build/html/topics/impala_components.html @@ -53,6 +53,12 @@

    + In Impala 2.9 and higher, you can control which hosts act as query coordinators + and which act as query executors, to improve scalability for highly concurrent workloads on large clusters. + See Scalability Considerations for Impala for details. +

    + +

    Related information: Modifying Impala Startup Options, Starting Impala, Setting the Idle Query and Idle Session Timeouts for impalad, Ports Used by Impala, Using Impala through a Proxy for High Availability http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_compute_stats.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_compute_stats.html b/docs/build/html/topics/impala_compute_stats.html index fcba3d6..a20c0b2 100644 --- a/docs/build/html/topics/impala_compute_stats.html +++ b/docs/build/html/topics/impala_compute_stats.html @@ -543,7 +543,7 @@ show table stats item_partitioned; Kudu tables. Therefore, you do not need to re-run the operation when you see -1 in the # Rows column of the output from SHOW TABLE STATS. That column always shows -1 for - all Kudu tables. + all Kudu tables.

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_conditional_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_conditional_functions.html b/docs/build/html/topics/impala_conditional_functions.html index 713946b..7490c1a 100644 --- a/docs/build/html/topics/impala_conditional_functions.html +++ b/docs/build/html/topics/impala_conditional_functions.html @@ -488,6 +488,60 @@ END

    +
    + nvl2(type a, type ifNull, type ifNotNull) +
    + +
    + + Purpose: Enhanced variant of the nvl() function. Tests an expression + and returns different result values depending on whether it is NULL or not. + If the first argument is NULL, returns the second argument. + If the first argument is not NULL, returns the third argument. + Equivalent to the nvl2() function from Oracle Database. +

    + Return type: Same as the first argument value +

    +

    + Added in: Impala 2.9.0 +

    +

    + Examples: +

    +

    + The following examples show how a query can use special indicator values + to represent null and not-null expression values. The first example tests + an INT column and so uses special integer values. + The second example tests a STRING column and so uses + special string values. +

    +
    
    +select x, nvl2(x, 999, 0) from nvl2_demo;
    ++------+---------------------------+
    +| x    | if(x is not null, 999, 0) |
    ++------+---------------------------+
    +| NULL | 0                         |
    +| 1    | 999                       |
    +| NULL | 0                         |
    +| 2    | 999                       |
    ++------+---------------------------+
    +
    +select s, nvl2(s, 'is not null', 'is null') from nvl2_demo;
    ++------+---------------------------------------------+
    +| s    | if(s is not null, 'is not null', 'is null') |
    ++------+---------------------------------------------+
    +| NULL | is null                                     |
    +| one  | is not null                                 |
    +| NULL | is null                                     |
    +| two  | is not null                                 |
    ++------+---------------------------------------------+
    +
    +
    + + + + +
    zeroifnull(numeric_expr)
    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_create_table.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_create_table.html b/docs/build/html/topics/impala_create_table.html index 2f88c58..1d890d8 100644 --- a/docs/build/html/topics/impala_create_table.html +++ b/docs/build/html/topics/impala_create_table.html @@ -56,6 +56,7 @@ [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] + [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [ @@ -72,6 +73,7 @@
    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] db_name.]table_name
       [PARTITIONED BY (col_name[, ...])]
    +  [SORT BY ([column [, column ...]])]
       [COMMENT 'table_comment']
       [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
       [
    @@ -130,6 +132,7 @@ file_format:
     
     
    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
       LIKE PARQUET 'hdfs_path_of_parquet_file'
    +  [SORT BY ([column [, column ...]])]
       [COMMENT 'table_comment']
       [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
       [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
    @@ -346,6 +349,83 @@ AS
         

    + Sorted tables (SORT BY clause): +

    + +

    + The optional SORT BY clause lets you specify zero or more columns + that are sorted in the data files created by each Impala INSERT or + CREATE TABLE AS SELECT operation. Creating data files that are + sorted is most useful for Parquet tables, where the metadata stored inside each file includes + the minimum and maximum values for each column in the file. (The statistics apply to each row group + within the file; for simplicity, Impala writes a single row group in each file.) Grouping + data values together in relatively narrow ranges within each data file makes it possible + for Impala to quickly skip over data files that do not contain value ranges indicated in + the WHERE clause of a query, and can improve the effectiveness + of Parquet encoding and compression. +

    + +

    + This clause is not applicable for Kudu tables or HBase tables. Although it works + for other HDFS file formats besides Parquet, the more efficient layout is most + evident with Parquet tables, because each Parquet data file includes statistics + about the data values in that file. +

    + +

    + The SORT BY columns cannot include any partition key columns + for a partitioned table, because those column values are not represented in + the underlying data files. +

    + +

    + Because data files can arrive in Impala tables by mechanisms that do not respect + the SORT BY clause, such as LOAD DATA or ETL + tools that create HDFS files, Impala does not guarantee or rely on the data being + sorted. The sorting aspect is only used to create a more efficient layout for + Parquet files generated by Impala, which helps to optimize the processing of + those Parquet files during Impala queries. During an INSERT + or CREATE TABLE AS SELECT operation, the sorting occurs + when the SORT BY clause applies to the destination table + for the data, regardless of whether the source table has a SORT BY + clause. +

    + +

    + For example, when creating a table intended to contain census data, you might define + sort columns such as last name and state. If a data file in this table contains a + narrow range of last names, for example from Smith to Smythe, + Impala can quickly detect that this data file contains no matches for a WHERE + clause such as WHERE last_name = 'Jones' and avoid reading the entire file. +

    + +
    CREATE TABLE census_data (last_name STRING, first_name STRING, state STRING, address STRING)
    +  SORT BY (last_name, state)
    +  STORED AS PARQUET;
    +
    + +

    + Likewise, if an existing table contains data without any sort order, you can reorganize + the data in a more efficient way by using INSERT or + CREATE TABLE AS SELECT to copy that data into a new table with a + SORT BY clause: +

    + +
    CREATE TABLE sorted_census_data
    +  SORT BY (last_name, state)
    +  STORED AS PARQUET
    +  AS SELECT last_name, first_name, state, address
    +    FROM unsorted_census_data;
    +
    + +

    + The metadata for the SORT BY clause is stored in the TBLPROPERTIES + fields for the table. Other SQL engines that can interoperate with Impala tables, such as Hive + and Spark SQL, do not recognize this property when inserting into a table that has a SORT BY + clause. +

    + +

    Kudu considerations:

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_datetime_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_datetime_functions.html b/docs/build/html/topics/impala_datetime_functions.html index 222ae8c..1649c0a 100644 --- a/docs/build/html/topics/impala_datetime_functions.html +++ b/docs/build/html/topics/impala_datetime_functions.html @@ -169,7 +169,7 @@ select now(), current_timestamp(); | 2016-05-19 16:10:14.237849000 | 2016-05-19 16:10:14.237849000 | +-------------------------------+-------------------------------+ -select current_timestamp() as right_now, +select current_timestamp() as right_now, current_timestamp() + interval 3 hours as in_three_hours; +-------------------------------+-------------------------------+ | right_now | in_three_hours | @@ -391,7 +391,7 @@ select date_sub(cast('2016-05-31' as timestamp), interval 1 months) as 'april_31 Examples:

    - The following example shows how comparing a "late" value with + The following example shows how comparing a "late" value with an "earlier" value produces a positive number. In this case, the result is (365 * 5) + 1, because one of the intervening years is a leap year. @@ -713,9 +713,10 @@ select now() as right_now, days_sub(now(), 31) as 31_days_ago; Purpose: Returns one of the numeric date or time fields from a TIMESTAMP value.

    - Unit argument: The unit string can be one of year, - month, day, hour, minute, - second, or millisecond. This argument value is case-insensitive. + Unit argument: The unit string can be one of epoch, + year, month, day, hour, + minute, second, or millisecond. + This argument value is case-insensitive.

    In Impala 2.0 and higher, you can use special syntax rather than a regular function call, for @@ -754,8 +755,8 @@ select now() as right_now, +-------------------------------+-----------+------------+ select now() as right_now, - extract(day from now()) as this_day, - extract(hour from now()) as this_hour; + extract(day from now()) as this_day, + extract(hour from now()) as this_hour; +-------------------------------+----------+-----------+ | right_now | this_day | this_hour | +-------------------------------+----------+-----------+ @@ -1696,6 +1697,14 @@ with t1 as (select trunc(now(), 'dd') as today) Return type: timestamp

    + Kudu considerations: +

    +

    + The nanosecond portion of an Impala TIMESTAMP value + is rounded to the nearest microsecond when that value is stored in a + Kudu table. +

    +

    Examples:

    
    @@ -1731,6 +1740,14 @@ select now() as right_now, nanoseconds_add(now(), 1e9) as 1_second_later;
               

    Return type: timestamp

    +

    + Kudu considerations: +

    +

    + The nanosecond portion of an Impala TIMESTAMP value + is rounded to the nearest microsecond when that value is stored in a + Kudu table. +

    
     select now() as right_now, nanoseconds_sub(now(), 1) as 1_nanosecond_earlier;
     +-------------------------------+-------------------------------+
    
    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_decimal.html
    ----------------------------------------------------------------------
    diff --git a/docs/build/html/topics/impala_decimal.html b/docs/build/html/topics/impala_decimal.html
    index 8cec53e..9604c5f 100644
    --- a/docs/build/html/topics/impala_decimal.html
    +++ b/docs/build/html/topics/impala_decimal.html
    @@ -807,7 +807,7 @@ SELECT CAST(1000.5 AS DECIMAL);
             Kudu considerations:
           

    - Currently, the data types DECIMAL, TIMESTAMP, CHAR, VARCHAR, + Currently, the data types DECIMAL, CHAR, VARCHAR, ARRAY, MAP, and STRUCT cannot be used with Kudu tables.

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_decimal_v2.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_decimal_v2.html b/docs/build/html/topics/impala_decimal_v2.html new file mode 100644 index 0000000..4f1b5ea --- /dev/null +++ b/docs/build/html/topics/impala_decimal_v2.html @@ -0,0 +1,32 @@ + +DECIMAL_V2 Query Option
    + +

    DECIMAL_V2 Query Option

    + + + +
    + +

    + A query option that changes behavior related to the DECIMAL + data type. +

    + +
    Important: +

    + This query option is currently unsupported. + Its precise behavior is currently undefined and might change + in the future. +

    +
    + +

    + Type: Boolean; recognized values are 1 and 0, or true and false; + any other value interpreted as false +

    +

    + Default: false (shown as 0 in output of SET statement) +

    +
    +
    \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_default_join_distribution_mode.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_default_join_distribution_mode.html b/docs/build/html/topics/impala_default_join_distribution_mode.html new file mode 100644 index 0000000..c866519 --- /dev/null +++ b/docs/build/html/topics/impala_default_join_distribution_mode.html @@ -0,0 +1,113 @@ + +DEFAULT_JOIN_DISTRIBUTION_MODE Query Option
    + +

    DEFAULT_JOIN_DISTRIBUTION_MODE Query Option

    + + + +
    + +

    + + This option determines the join distribution that Impala uses when any of the tables + involved in a join query is missing statistics. +

    + +

    + Impala optimizes join queries based on the presence of table statistics, + which are produced by the Impala COMPUTE STATS statement. + By default, when a table involved in the join query does not have statistics, + Impala uses the "broadcast" technique that transmits the entire contents + of the table to all executor nodes participating in the query. If one table + involved in a join has statistics and the other does not, the table without + statistics is broadcast. If both tables are missing statistics, the table + that is referenced second in the join order is broadcast. This behavior + is appropriate when the table involved is relatively small, but can lead to + excessive network, memory, and CPU overhead if the table being broadcast is + large. +

    + +

    + Because Impala queries frequently involve very large tables, and suboptimal + joins for such tables could result in spilling or out-of-memory errors, + the setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE lets you + override the default behavior. The shuffle join mechanism divides the corresponding rows + of each table involved in a join query using a hashing algorithm, and transmits + subsets of the rows to other nodes for processing. Typically, this kind of join is + more efficient for joins between large tables of similar size. +

    + +

    + The setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE is + recommended when setting up and deploying new clusters, because it is less likely + to result in serious consequences such as spilling or out-of-memory errors if + the query plan is based on incomplete information. This setting is not the default, + to avoid changing the performance characteristics of join queries for clusters that + are already tuned for their existing workloads. +

    + +

    + Type: integer +

    +

    + The allowed values are BROADCAST (equivalent to 0) + or SHUFFLE (equivalent to 1). +

    + +

    + Examples: +

    +

    + The following examples demonstrate appropriate scenarios for each + setting of this query option. +

    + +
    
    +-- Create a billion-row table.
    +create table big_table stored as parquet
    +  as select * from huge_table limit 1e9;
    +
    +-- For a big table with no statistics, the
    +-- shuffle join mechanism is appropriate.
    +set default_join_distribution_mode=shuffle;
    +
    +...join queries involving the big table...
    +
    + +
    
    +-- Create a hundred-row table.
    +create table tiny_table stored as parquet
    +  as select * from huge_table limit 100;
    +
    +-- For a tiny table with no statistics, the
    +-- broadcast join mechanism is appropriate.
    +set default_join_distribution_mode=broadcast;
    +
    +...join queries involving the tiny table...
    +
    + +
    
    +compute stats tiny_table;
    +compute stats big_table;
    +
    +-- Once the stats are computed, the query option has
    +-- no effect on join queries involving these tables.
    +-- Impala can determine the absolute and relative sizes
    +-- of each side of the join query by examining the
    +-- row size, cardinality, and so on of each table.
    +
    +...join queries involving both of these tables...
    +
    + +

    + Related information: +

    +

    + COMPUTE STATS Statement, + Joins in Impala SELECT Statements, + Performance Considerations for Join Queries +

    + +
    +
    \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_describe.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_describe.html b/docs/build/html/topics/impala_describe.html index 963ef6e..0c20071 100644 --- a/docs/build/html/topics/impala_describe.html +++ b/docs/build/html/topics/impala_describe.html @@ -745,7 +745,7 @@ Returned 27 row(s) in 0.17s

    - The following example shows DESCRIBE output for a simple Kudu table, with + The following example shows DESCRIBE output for a simple Kudu table, with a single-column primary key and all column attributes left with their default values:

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_double.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_double.html b/docs/build/html/topics/impala_double.html index b87994c..a1b87fb 100644 --- a/docs/build/html/topics/impala_double.html +++ b/docs/build/html/topics/impala_double.html @@ -59,6 +59,17 @@ The data type REAL is an alias for DOUBLE.

    + +

    + Impala does not evaluate NaN (not a number) as equal to any other numeric values, + including other NaN values. For example, the following statement, which evaluates equality + between two NaN values, returns false: +

    + +
    
    +SELECT CAST('nan' AS DOUBLE)=CAST('nan' AS DOUBLE);
    +
    +

    Examples:

    http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_explain.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_explain.html b/docs/build/html/topics/impala_explain.html index 473a94d..0de916d 100644 --- a/docs/build/html/topics/impala_explain.html +++ b/docs/build/html/topics/impala_explain.html @@ -248,7 +248,7 @@ EXPLAIN_LEVEL set to extended against HDFS-based tables.

    -
    +

    To see which predicates Impala can "push down" to Kudu for efficient evaluation, without transmitting unnecessary rows back to Impala, look for the kudu predicates item in @@ -260,22 +260,27 @@ EXPLAIN_LEVEL set to extended and non-primary key column Y, you can see that some operators in the WHERE clause are evaluated immediately by Kudu and others are evaluated later by Impala: +

    +
    
     EXPLAIN SELECT x,y from kudu_table WHERE
    -  x = 1 AND x NOT IN (2,3) AND y = 1
    -  AND x IS NOT NULL AND x > 0;
    +  x = 1 AND y NOT IN (2,3) AND z = 1
    +  AND a IS NOT NULL AND b > 0 AND length(s) > 5;
     +----------------
     | Explain String
     +----------------
     ...
    -| 00:SCAN KUDU [jrussell.hash_only]
    -|    predicates: x IS NOT NULL, x NOT IN (2, 3)
    -|    kudu predicates: x = 1, x > 0, y = 1
    +| 00:SCAN KUDU [kudu_table]
    +|    predicates: y NOT IN (2, 3), length(s) > 5
    +|    kudu predicates: a IS NOT NULL, b > 0, x = 1, z = 1
     
    - Only binary predicates and IN predicates containing - literal values that exactly match the types in the Kudu table, and do not + +

    + Only binary predicates, IS NULL and IS NOT NULL + (in Impala 2.9 and higher), and IN predicates + containing literal values that exactly match the types in the Kudu table, and do not require any casting, can be pushed to Kudu. -

    +

    Related information: http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_explain_plan.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_explain_plan.html b/docs/build/html/topics/impala_explain_plan.html index bcd0855..e749869 100644 --- a/docs/build/html/topics/impala_explain_plan.html +++ b/docs/build/html/topics/impala_explain_plan.html @@ -111,8 +111,8 @@

    The amount of detail displayed in the EXPLAIN output is controlled by the EXPLAIN_LEVEL query option. You typically - increase this setting from normal to verbose (or from 0 - to 1) when doublechecking the presence of table and column statistics during performance + increase this setting from standard to extended (or from 1 + to 2) when doublechecking the presence of table and column statistics during performance tuning, or when estimating query resource usage in conjunction with the resource management features.