impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jruss...@apache.org
Subject [5/6] incubator-impala git commit: Add Impala 2.9 docs from master branch, with commit hash f1a3d8e14dae4948ce77e2f85e036d83f2d8b246
Date Wed, 12 Jul 2017 07:17:50 GMT
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_auditing.html b/docs/build/html/topics/impala_auditing.html
index bcd6d9f..eb6f450 100644
--- a/docs/build/html/topics/impala_auditing.html
+++ b/docs/build/html/topics/impala_auditing.html
@@ -25,17 +25,27 @@
       </li>
 
       <li class="li">
-        Decide how many queries will be represented in each log file. By default,
-        Impala starts a new log file every 5000 queries. To specify a different number,
+        Decide how many queries will be represented in each audit event log file. By default,
+        Impala starts a new audit event log file every 5000 queries. To specify a different
number,
         <span class="ph">include
-        the option <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword
varname">number_of_queries</var></code>
+        the option <code class="ph codeph">--max_audit_event_log_file_size=<var
class="keyword varname">number_of_queries</var></code>
         in the <span class="keyword cmdname">impalad</span> startup options</span>.
       </li>
 
-      <li class="li"> 
+      <li class="li">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control
how many
+        audit event log files are kept on each host. Specify the option
+        <code class="ph codeph">--max_audit_event_log_files=<var class="keyword
varname">number_of_log_files</var></code>
+        in the <span class="keyword cmdname">impalad</span> startup options.
Once the limit is reached, older
+        files are rotated out using the same mechanism as for other Impala log files.
+        The default value for this setting is 0, representing an unlimited number of audit
+        event log files.
+      </li>
+
+      <li class="li">
         Use a cluster manager with governance capabilities to filter, visualize,
         and produce reports based on the audit logs collected
-        from all the hosts in the cluster. 
+        from all the hosts in the cluster.
       </li>
     </ul>
 
@@ -61,18 +71,18 @@
         <code class="ph codeph">fsync()</code> system call) to avoid loss of
audit data in case of a crash.
       </p>
 
-      <p class="p"> 
+      <p class="p">
         The runtime overhead of auditing applies to whichever host serves as the coordinator
         for the query, that is, the host you connect to when you issue the query. This might
         be the same host for all queries, or different applications or users might connect
to
-        and issue queries through different hosts. 
+        and issue queries through different hosts.
       </p>
 
-      <p class="p"> 
+      <p class="p">
         To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit
log
         data (using the <code class="ph codeph">fsync()</code> system call) periodically
rather than after
         every query. Currently, the <code class="ph codeph">fsync()</code> calls
are issued at a fixed
-        interval, every 5 seconds. 
+        interval, every 5 seconds.
       </p>
 
       <p class="p">
@@ -92,12 +102,12 @@
 
     <div class="body conbody">
 
-      <p class="p"> 
+      <p class="p">
         The audit log files represent the query information in JSON format, one query per
line.
         Typically, rather than looking at the log files themselves, you should use cluster-management
         software to consolidate the log data from all Impala hosts and filter and visualize
the results
         in useful ways. (If you do examine the raw log data, you might run the files through
-        a JSON pretty-printer first.) 
+        a JSON pretty-printer first.)
      </p>
 
       <p class="p">

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_authorization.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_authorization.html b/docs/build/html/topics/impala_authorization.html
index 13a8fb4..6b66523 100644
--- a/docs/build/html/topics/impala_authorization.html
+++ b/docs/build/html/topics/impala_authorization.html
@@ -682,8 +682,7 @@ sales = hdfs://ha-nn-uri/etc/access/sales.ini
 
         <p class="p">
           To enable URIs in per-DB policy files, the Java configuration option <code class="ph
codeph">sentry.allow.uri.db.policyfile</code>
-          must be set to <code class="ph codeph">true</code>.
-	  For example:
+          must be set to <code class="ph codeph">true</code>. For example:
         </p>
 
 <pre class="pre codeblock"><code>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_char.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_char.html b/docs/build/html/topics/impala_char.html
index e0b4cb9..62ab8ef 100644
--- a/docs/build/html/topics/impala_char.html
+++ b/docs/build/html/topics/impala_char.html
@@ -240,7 +240,7 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']')
as c fr
         <strong class="ph b">Kudu considerations:</strong>
       </p>
     <p class="p">
-        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code
class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>,
<code class="ph codeph">VARCHAR</code>,
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code
class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
         <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>,
and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
       </p>
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_components.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_components.html b/docs/build/html/topics/impala_components.html
index d3d210d..c7245c3 100644
--- a/docs/build/html/topics/impala_components.html
+++ b/docs/build/html/topics/impala_components.html
@@ -53,6 +53,12 @@
       </p>
 
       <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control
which hosts act as query coordinators
+        and which act as query executors, to improve scalability for highly concurrent workloads
on large clusters.
+        See <a class="xref" href="impala_scalability.html">Scalability Considerations
for Impala</a> for details.
+      </p>
+
+      <p class="p">
         <strong class="ph b">Related information:</strong> <a class="xref"
href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
         <a class="xref" href="impala_processes.html#processes">Starting Impala</a>,
<a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and
Idle Session Timeouts for impalad</a>,
         <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>,
<a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High
Availability</a>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_compute_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_compute_stats.html b/docs/build/html/topics/impala_compute_stats.html
index fcba3d6..a20c0b2 100644
--- a/docs/build/html/topics/impala_compute_stats.html
+++ b/docs/build/html/topics/impala_compute_stats.html
@@ -543,7 +543,7 @@ show table stats item_partitioned;
       Kudu tables. Therefore, you do not need to re-run the operation when
       you see -1 in the <code class="ph codeph"># Rows</code> column of the output
from
       <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows
-1 for
-      all Kudu tables. 
+      all Kudu tables.
     </p>
 
     <p class="p">

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_conditional_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_conditional_functions.html b/docs/build/html/topics/impala_conditional_functions.html
index 713946b..7490c1a 100644
--- a/docs/build/html/topics/impala_conditional_functions.html
+++ b/docs/build/html/topics/impala_conditional_functions.html
@@ -488,6 +488,60 @@ END</code></pre>
 
       
 
+        <dt class="dt dlterm" id="conditional_functions__nvl2">
+          <code class="ph codeph">nvl2(type a, type ifNull, type ifNotNull)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Enhanced variant of the <code
class="ph codeph">nvl()</code> function. Tests an expression
+          and returns different result values depending on whether it is <code class="ph
codeph">NULL</code> or not.
+          If the first argument is <code class="ph codeph">NULL</code>, returns
the second argument.
+          If the first argument is not <code class="ph codeph">NULL</code>, returns
the third argument.
+          Equivalent to the <code class="ph codeph">nvl2()</code> function from
Oracle Database.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument
value
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala
2.9.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how a query can use special indicator values
+            to represent null and not-null expression values. The first example tests
+            an <code class="ph codeph">INT</code> column and so uses special
integer values.
+            The second example tests a <code class="ph codeph">STRING</code>
column and so uses
+            special string values.
+          </p>
+<pre class="pre codeblock"><code>
+select x, nvl2(x, 999, 0) from nvl2_demo;
++------+---------------------------+
+| x    | if(x is not null, 999, 0) |
++------+---------------------------+
+| NULL | 0                         |
+| 1    | 999                       |
+| NULL | 0                         |
+| 2    | 999                       |
++------+---------------------------+
+
+select s, nvl2(s, 'is not null', 'is null') from nvl2_demo;
++------+---------------------------------------------+
+| s    | if(s is not null, 'is not null', 'is null') |
++------+---------------------------------------------+
+| NULL | is null                                     |
+| one  | is not null                                 |
+| NULL | is null                                     |
+| two  | is not null                                 |
++------+---------------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
         <dt class="dt dlterm" id="conditional_functions__zeroifnull">
           <code class="ph codeph">zeroifnull(<var class="keyword varname">numeric_expr</var>)</code>
         </dt>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_create_table.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_table.html b/docs/build/html/topics/impala_create_table.html
index 2f88c58..1d890d8 100644
--- a/docs/build/html/topics/impala_create_table.html
+++ b/docs/build/html/topics/impala_create_table.html
@@ -56,6 +56,7 @@
     [, ...]
   )
   [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword
varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'],
...)]
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var>
[, <var class="keyword varname">column</var> ...]])]</span>
   [COMMENT '<var class="keyword varname">table_comment</var>']
   [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword
varname">value1</var>', '<var class="keyword varname">key2</var>'='<var
class="keyword varname">value2</var>', ...)]
   [
@@ -72,6 +73,7 @@
 
 <pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <var
class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
   <span class="ph">[PARTITIONED BY (<var class="keyword varname">col_name</var>[,
...])]</span>
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var>
[, <var class="keyword varname">column</var> ...]])]</span>
   [COMMENT '<var class="keyword varname">table_comment</var>']
   [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword
varname">value1</var>', '<var class="keyword varname">key2</var>'='<var
class="keyword varname">value2</var>', ...)]
   [
@@ -130,6 +132,7 @@ file_format:
 
 <pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var
class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
   LIKE PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>'
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var>
[, <var class="keyword varname">column</var> ...]])]</span>
   [COMMENT '<var class="keyword varname">table_comment</var>']
   [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword
varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'],
...)]
   [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword
varname">value1</var>', '<var class="keyword varname">key2</var>'='<var
class="keyword varname">value2</var>', ...)]
@@ -346,6 +349,83 @@ AS
     </p>
 
     <p class="p">
+      <strong class="ph b">Sorted tables (SORT BY clause):</strong>
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">SORT BY</code> clause lets you specify
zero or more columns
+      that are sorted in the data files created by each Impala <code class="ph codeph">INSERT</code>
or
+      <code class="ph codeph">CREATE TABLE AS SELECT</code> operation. Creating
data files that are
+      sorted is most useful for Parquet tables, where the metadata stored inside each file
includes
+      the minimum and maximum values for each column in the file. (The statistics apply to
each row group
+      within the file; for simplicity, Impala writes a single row group in each file.) Grouping
+      data values together in relatively narrow ranges within each data file makes it possible
+      for Impala to quickly skip over data files that do not contain value ranges indicated
in
+      the <code class="ph codeph">WHERE</code> clause of a query, and can improve
the effectiveness
+      of Parquet encoding and compression.
+    </p>
+
+    <p class="p">
+      This clause is not applicable for Kudu tables or HBase tables. Although it works
+      for other HDFS file formats besides Parquet, the more efficient layout is most
+      evident with Parquet tables, because each Parquet data file includes statistics
+      about the data values in that file.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SORT BY</code> columns cannot include any partition
key columns
+      for a partitioned table, because those column values are not represented in
+      the underlying data files.
+    </p>
+
+    <p class="p">
+      Because data files can arrive in Impala tables by mechanisms that do not respect
+      the <code class="ph codeph">SORT BY</code> clause, such as <code class="ph
codeph">LOAD DATA</code> or ETL
+      tools that create HDFS files, Impala does not guarantee or rely on the data being
+      sorted. The sorting aspect is only used to create a more efficient layout for
+      Parquet files generated by Impala, which helps to optimize the processing of
+      those Parquet files during Impala queries. During an <code class="ph codeph">INSERT</code>
+      or <code class="ph codeph">CREATE TABLE AS SELECT</code> operation, the
sorting occurs
+      when the <code class="ph codeph">SORT BY</code> clause applies to the destination
table
+      for the data, regardless of whether the source table has a <code class="ph codeph">SORT
BY</code>
+      clause.
+    </p>
+
+    <p class="p">
+      For example, when creating a table intended to contain census data, you might define
+      sort columns such as last name and state. If a data file in this table contains a
+      narrow range of last names, for example from <code class="ph codeph">Smith</code>
to <code class="ph codeph">Smythe</code>,
+      Impala can quickly detect that this data file contains no matches for a <code class="ph
codeph">WHERE</code>
+      clause such as <code class="ph codeph">WHERE last_name = 'Jones'</code>
and avoid reading the entire file.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE census_data (last_name STRING,
first_name STRING, state STRING, address STRING)
+  SORT BY (last_name, state)
+  STORED AS PARQUET;
+</code></pre>
+
+    <p class="p">
+      Likewise, if an existing table contains data without any sort order, you can reorganize
+      the data in a more efficient way by using <code class="ph codeph">INSERT</code>
or
+      <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy that data
into a new table with a
+      <code class="ph codeph">SORT BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE sorted_census_data
+  SORT BY (last_name, state)
+  STORED AS PARQUET
+  AS SELECT last_name, first_name, state, address
+    FROM unsorted_census_data;
+</code></pre>
+
+    <p class="p">
+      The metadata for the <code class="ph codeph">SORT BY</code> clause is stored
in the <code class="ph codeph">TBLPROPERTIES</code>
+      fields for the table. Other SQL engines that can interoperate with Impala tables, such
as Hive
+      and Spark SQL, do not recognize this property when inserting into a table that has
a <code class="ph codeph">SORT BY</code>
+      clause.
+    </p>
+
+    <p class="p">
         <strong class="ph b">Kudu considerations:</strong>
       </p>
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_datetime_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_datetime_functions.html b/docs/build/html/topics/impala_datetime_functions.html
index 222ae8c..1649c0a 100644
--- a/docs/build/html/topics/impala_datetime_functions.html
+++ b/docs/build/html/topics/impala_datetime_functions.html
@@ -169,7 +169,7 @@ select now(), current_timestamp();
 | 2016-05-19 16:10:14.237849000 | 2016-05-19 16:10:14.237849000 |
 +-------------------------------+-------------------------------+
 
-select current_timestamp() as right_now,            
+select current_timestamp() as right_now,
   current_timestamp() + interval 3 hours as in_three_hours;
 +-------------------------------+-------------------------------+
 | right_now                     | in_three_hours                |
@@ -391,7 +391,7 @@ select date_sub(cast('2016-05-31' as timestamp), interval 1 months) as
'april_31
         <strong class="ph b">Examples:</strong>
       </p>
           <p class="p">
-            The following example shows how comparing a <span class="q">"late"</span>
value with 
+            The following example shows how comparing a <span class="q">"late"</span>
value with
             an <span class="q">"earlier"</span> value produces a positive number.
In this case,
             the result is (365 * 5) + 1, because one of the intervening years is
             a leap year.
@@ -713,9 +713,10 @@ select now() as right_now, days_sub(now(), 31) as 31_days_ago;
           
           <strong class="ph b">Purpose:</strong> Returns one of the numeric date
or time fields from a <code class="ph codeph">TIMESTAMP</code> value.
           <p class="p">
-            <strong class="ph b">Unit argument:</strong> The <code class="ph
codeph">unit</code> string can be one of <code class="ph codeph">year</code>,
-            <code class="ph codeph">month</code>, <code class="ph codeph">day</code>,
<code class="ph codeph">hour</code>, <code class="ph codeph">minute</code>,
-            <code class="ph codeph">second</code>, or <code class="ph codeph">millisecond</code>.
This argument value is case-insensitive.
+            <strong class="ph b">Unit argument:</strong> The <code class="ph
codeph">unit</code> string can be one of <code class="ph codeph">epoch</code>,
+            <code class="ph codeph">year</code>, <code class="ph codeph">month</code>,
<code class="ph codeph">day</code>, <code class="ph codeph">hour</code>,
+            <code class="ph codeph">minute</code>, <code class="ph codeph">second</code>,
or <code class="ph codeph">millisecond</code>.
+            This argument value is case-insensitive.
           </p>
           <div class="p">
             In Impala 2.0 and higher, you can use special syntax rather than a regular function
call, for
@@ -754,8 +755,8 @@ select now() as right_now,
 +-------------------------------+-----------+------------+
 
 select now() as right_now,
-  extract(day from now()) as this_day,  
-  extract(hour from now()) as this_hour;  
+  extract(day from now()) as this_day,
+  extract(hour from now()) as this_hour;
 +-------------------------------+----------+-----------+
 | right_now                     | this_day | this_hour |
 +-------------------------------+----------+-----------+
@@ -1696,6 +1697,14 @@ with t1 as (select trunc(now(), 'dd') as today)
             <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
           </p>
           <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+          <p class="p">
+        The nanosecond portion of an Impala <code class="ph codeph">TIMESTAMP</code>
value
+        is rounded to the nearest microsecond when that value is stored in a
+        Kudu table.
+      </p>
+          <p class="p">
         <strong class="ph b">Examples:</strong>
       </p>
 <pre class="pre codeblock"><code>
@@ -1731,6 +1740,14 @@ select now() as right_now, nanoseconds_add(now(), 1e9) as 1_second_later;
           <p class="p">
             <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
           </p>
+          <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+          <p class="p">
+        The nanosecond portion of an Impala <code class="ph codeph">TIMESTAMP</code>
value
+        is rounded to the nearest microsecond when that value is stored in a
+        Kudu table.
+      </p>
 <pre class="pre codeblock"><code>
 select now() as right_now, nanoseconds_sub(now(), 1) as 1_nanosecond_earlier;
 +-------------------------------+-------------------------------+

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_decimal.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_decimal.html b/docs/build/html/topics/impala_decimal.html
index 8cec53e..9604c5f 100644
--- a/docs/build/html/topics/impala_decimal.html
+++ b/docs/build/html/topics/impala_decimal.html
@@ -807,7 +807,7 @@ SELECT CAST(1000.5 AS DECIMAL);
         <strong class="ph b">Kudu considerations:</strong>
       </p>
     <p class="p">
-        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code
class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>,
<code class="ph codeph">VARCHAR</code>,
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code
class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
         <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>,
and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
       </p>
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_decimal_v2.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_decimal_v2.html b/docs/build/html/topics/impala_decimal_v2.html
new file mode 100644
index 0000000..4f1b5ea
--- /dev/null
+++ b/docs/build/html/topics/impala_decimal_v2.html
@@ -0,0 +1,32 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html;
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright
2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type"
content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta
name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta
name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal_v2"><link
rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL_V2 Query
Option</title></head><body id="decimal_v2"><main role="main"><article
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DECIMAL_V2 Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A query option that changes behavior related to the <code class="ph codeph">DECIMAL</code>
+      data type.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>

+      <p class="p">
+        This query option is currently unsupported.
+        Its precise behavior is currently undefined and might change
+        in the future.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1
and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
(shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div
class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query
Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_default_join_distribution_mode.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_default_join_distribution_mode.html b/docs/build/html/topics/impala_default_join_distribution_mode.html
new file mode 100644
index 0000000..c866519
--- /dev/null
+++ b/docs/build/html/topics/impala_default_join_distribution_mode.html
@@ -0,0 +1,113 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html;
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright
2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type"
content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta
name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta
name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_join_distribution_mode"><link
rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_JOIN_DISTRIBUTION_MODE
Query Option</title></head><body id="default_join_distribution_mode"><main
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_JOIN_DISTRIBUTION_MODE Query
Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This option determines the join distribution that Impala uses when any of the tables
+      involved in a join query is missing statistics.
+    </p>
+
+    <p class="p">
+      Impala optimizes join queries based on the presence of table statistics,
+      which are produced by the Impala <code class="ph codeph">COMPUTE STATS</code>
statement.
+      By default, when a table involved in the join query does not have statistics,
+      Impala uses the <span class="q">"broadcast"</span> technique that transmits
the entire contents
+      of the table to all executor nodes participating in the query. If one table
+      involved in a join has statistics and the other does not, the table without
+      statistics is broadcast. If both tables are missing statistics, the table
+      that is referenced second in the join order is broadcast. This behavior
+      is appropriate when the table involved is relatively small, but can lead to
+      excessive network, memory, and CPU overhead if the table being broadcast is
+      large.
+    </p>
+
+    <p class="p">
+      Because Impala queries frequently involve very large tables, and suboptimal
+      joins for such tables could result in spilling or out-of-memory errors,
+      the setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code>
lets you
+      override the default behavior. The shuffle join mechanism divides the corresponding
rows
+      of each table involved in a join query using a hashing algorithm, and transmits
+      subsets of the rows to other nodes for processing. Typically, this kind of join is
+      more efficient for joins between large tables of similar size.
+    </p>
+
+    <p class="p">
+      The setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code>
is
+      recommended when setting up and deploying new clusters, because it is less likely
+      to result in serious consequences such as spilling or out-of-memory errors if
+      the query plan is based on incomplete information. This setting is not the default,
+      to avoid changing the performance characteristics of join queries for clusters that
+      are already tuned for their existing workloads.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p">
+      The allowed values are <code class="ph codeph">BROADCAST</code> (equivalent
to 0)
+      or <code class="ph codeph">SHUFFLE</code> (equivalent to 1).
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+    <p class="p">
+      The following examples demonstrate appropriate scenarios for each
+      setting of this query option.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Create a billion-row table.
+create table big_table stored as parquet
+  as select * from huge_table limit 1e9;
+
+-- For a big table with no statistics, the
+-- shuffle join mechanism is appropriate.
+set default_join_distribution_mode=shuffle;
+
+...join queries involving the big table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Create a hundred-row table.
+create table tiny_table stored as parquet
+  as select * from huge_table limit 100;
+
+-- For a tiny table with no statistics, the
+-- broadcast join mechanism is appropriate.
+set default_join_distribution_mode=broadcast;
+
+...join queries involving the tiny table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+compute stats tiny_table;
+compute stats big_table;
+
+-- Once the stats are computed, the query option has
+-- no effect on join queries involving these tables.
+-- Impala can determine the absolute and relative sizes
+-- of each side of the join query by examining the
+-- row size, cardinality, and so on of each table.
+
+...join queries involving both of these tables...
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_joins.html">Joins in Impala SELECT Statements</a>,
+      <a class="xref" href="impala_perf_joins.html">Performance Considerations for
Join Queries</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div
class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query
Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_describe.html b/docs/build/html/topics/impala_describe.html
index 963ef6e..0c20071 100644
--- a/docs/build/html/topics/impala_describe.html
+++ b/docs/build/html/topics/impala_describe.html
@@ -745,7 +745,7 @@ Returned 27 row(s) in 0.17s</code></pre>
     </ul>
 
     <p class="p">
-      The following example shows <code class="ph codeph">DESCRIBE</code> output
for a simple Kudu table, with 
+      The following example shows <code class="ph codeph">DESCRIBE</code> output
for a simple Kudu table, with
       a single-column primary key and all column attributes left with their default values:
     </p>
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_double.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_double.html b/docs/build/html/topics/impala_double.html
index b87994c..a1b87fb 100644
--- a/docs/build/html/topics/impala_double.html
+++ b/docs/build/html/topics/impala_double.html
@@ -59,6 +59,17 @@
       The data type <code class="ph codeph">REAL</code> is an alias for <code
class="ph codeph">DOUBLE</code>.
     </p>
 
+    
+    <p class="p">
+        Impala does not evaluate NaN (not a number) as equal to any other numeric values,
+        including other NaN values. For example, the following statement, which evaluates
equality
+        between two NaN values, returns <code class="ph codeph">false</code>:
+      </p>
+
+<pre class="pre codeblock"><code>
+SELECT CAST('nan' AS DOUBLE)=CAST('nan' AS DOUBLE);
+</code></pre>
+
     <p class="p">
         <strong class="ph b">Examples:</strong>
       </p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_explain.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_explain.html b/docs/build/html/topics/impala_explain.html
index 473a94d..0de916d 100644
--- a/docs/build/html/topics/impala_explain.html
+++ b/docs/build/html/topics/impala_explain.html
@@ -248,7 +248,7 @@ EXPLAIN_LEVEL set to extended
       against HDFS-based tables.
     </p>
 
-    <div class="p">
+    <p class="p">
       To see which predicates Impala can <span class="q">"push down"</span> to
Kudu for
       efficient evaluation, without transmitting unnecessary rows back
       to Impala, look for the <code class="ph codeph">kudu predicates</code>
item in
@@ -260,22 +260,27 @@ EXPLAIN_LEVEL set to extended
       and non-primary key column <code class="ph codeph">Y</code>, you can see
that
       some operators in the <code class="ph codeph">WHERE</code> clause are evaluated
       immediately by Kudu and others are evaluated later by Impala:
+    </p>
+
 <pre class="pre codeblock"><code>
 EXPLAIN SELECT x,y from kudu_table WHERE
-  x = 1 AND x NOT IN (2,3) AND y = 1
-  AND x IS NOT NULL AND x &gt; 0;
+  x = 1 AND y NOT IN (2,3) AND z = 1
+  AND a IS NOT NULL AND b &gt; 0 AND length(s) &gt; 5;
 +----------------
 | Explain String
 +----------------
 ...
-| 00:SCAN KUDU [jrussell.hash_only]
-|    predicates: x IS NOT NULL, x NOT IN (2, 3)
-|    kudu predicates: x = 1, x &gt; 0, y = 1
+| 00:SCAN KUDU [kudu_table]
+|    predicates: y NOT IN (2, 3), length(s) &gt; 5
+|    kudu predicates: a IS NOT NULL, b &gt; 0, x = 1, z = 1
 </code></pre>
-      Only binary predicates and <code class="ph codeph">IN</code> predicates
containing
-      literal values that exactly match the types in the Kudu table, and do not
+
+    <p class="p">
+      Only binary predicates, <code class="ph codeph">IS NULL</code> and <code
class="ph codeph">IS NOT NULL</code>
+      (in <span class="keyword">Impala 2.9</span> and higher), and <code class="ph
codeph">IN</code> predicates
+      containing literal values that exactly match the types in the Kudu table, and do not
       require any casting, can be pushed to Kudu.
-    </div>
+    </p>
 
     <p class="p">
         <strong class="ph b">Related information:</strong>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_explain_plan.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_explain_plan.html b/docs/build/html/topics/impala_explain_plan.html
index bcd0855..e749869 100644
--- a/docs/build/html/topics/impala_explain_plan.html
+++ b/docs/build/html/topics/impala_explain_plan.html
@@ -111,8 +111,8 @@
       <p class="p">
         The amount of detail displayed in the <code class="ph codeph">EXPLAIN</code>
output is controlled by the
         <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a>
query option. You typically
-        increase this setting from <code class="ph codeph">normal</code> to <code
class="ph codeph">verbose</code> (or from <code class="ph codeph">0</code>
-        to <code class="ph codeph">1</code>) when doublechecking the presence
of table and column statistics during performance
+        increase this setting from <code class="ph codeph">standard</code> to
<code class="ph codeph">extended</code> (or from <code class="ph codeph">1</code>
+        to <code class="ph codeph">2</code>) when doublechecking the presence
of table and column statistics during performance
         tuning, or when estimating query resource usage in conjunction with the resource
management features.
       </p>
 



Mime
View raw message