impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbap...@apache.org
Subject [22/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site
Date Wed, 12 Apr 2017 18:25:26 GMT
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mem_limit.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mem_limit.html b/docs/build/html/topics/impala_mem_limit.html
new file mode 100644
index 0000000..704dae9
--- /dev/null
+++ b/docs/build/html/topics/impala_mem_limit.html
@@ -0,0 +1,206 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
+      Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes.
+    </p>
+
+    <p class="p">
+      There are two levels of memory limit for Impala.
+      The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process
+      (which handles multiple queries concurrently).
+      That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>.
+      The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span>
+      or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query.
+      The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>,
+      and must always be less than the <span class="keyword cmdname">impalad</span> memory limit.
+    </p>
+
+    <p class="p">
+      If query processing exceeds the specified memory limit on any node, either the per-query limit or the
+      <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically.
+      Memory limits are checked periodically during query processing, so the actual memory in use
+      might briefly exceed the limit without the query being cancelled.
+    </p>
+
+    <p class="p">
+      When resource management is enabled, the mechanism for this option changes. If set, it overrides the
+      automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
+      query does not proceed until that much memory is available. The actual memory used by the query could be
+      lower, since some queries use much less memory than others. With resource management, the
+      <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any
+      node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
+      is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is
+      specified, Impala estimates the amount of memory needed on each node for each query, requests that much
+      memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each
+      node to the requested amount of memory during the query. Thus, if the query takes more memory than was
+      originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query
+      itself.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code>
+      for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized
+      formats, subsequent queries fail with an error.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (unlimited)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting,
+      or on a cluster with a workload shared between Impala and other data processing components.
+      You can prevent any query from accidentally using much more memory than expected,
+      which could negatively impact other Impala queries.
+    </p>
+
+    <p class="p">
+      Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>
+      to get a report of memory used for each phase of your most heavyweight queries on each node,
+      and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that.
+      See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about
+      the <code class="ph codeph">SUMMARY</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option
+      using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3000000000;
+MEM_LIMIT set to 3000000000
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3g;
+MEM_LIMIT set to 3g
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3gb;
+MEM_LIMIT set to 3gb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3m;
+MEM_LIMIT set to 3m
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+[localhost:21000] &gt; set mem_limit=3mb;
+MEM_LIMIT set to 3mb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+</code></pre>
+
+    <p class="p">
+      The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code>
+      values lead to errors for subsequent queries.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3tb;
+MEM_LIMIT set to 3tb
+[localhost:21000] &gt; select 5;
+ERROR: Failed to parse query memory limit from '3tb'.
+
+[localhost:21000] &gt; set mem_limit=xyz;
+MEM_LIMIT set to xyz
+[localhost:21000] &gt; select 5;
+Query: select 5
+ERROR: Failed to parse query memory limit from 'xyz'.
+</code></pre>
+
+    <p class="p">
+      The following examples shows the automatic query cancellation
+      when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded
+      on any host involved in the Impala query. First it runs a
+      successful query and checks the largest amount of memory
+      used on any node for any stage of the query.
+      Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code>
+      setting so that the same query cannot run.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select count(*) from customer;
+Query: select count(*) from customer
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
++------------------------+
+| count(distinct c_name) |
++------------------------+
+| 150000                 |
++------------------------+
+
+[localhost:21000] &gt; summary;
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail        |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| 06:AGGREGATE | 1      | 230.00ms | 230.00ms | 1       | 1          | 16.00 KB | -1 B          | FINALIZE      |
+| 05:EXCHANGE  | 1      | 43.44us  | 43.44us  | 1       | 1          | 0 B      | -1 B          | UNPARTITIONED |
+| 02:AGGREGATE | 1      | 227.14ms | 227.14ms | 1       | 1          | 12.00 KB | 10.00 MB      |               |
+| 04:AGGREGATE | 1      | 126.27ms | 126.27ms | 150.00K | 150.00K    | 15.17 MB | 10.00 MB      |               |
+| 03:EXCHANGE  | 1      | 44.07ms  | 44.07ms  | 150.00K | 150.00K    | 0 B      | 0 B           | HASH(c_name)  |
+<strong class="ph b">| 01:AGGREGATE | 1      | 361.94ms | 361.94ms | 150.00K | 150.00K    | 23.04 MB | 10.00 MB      |               |</strong>
+| 00:SCAN HDFS | 1      | 43.64ms  | 43.64ms  | 150.00K | 150.00K    | 24.19 MB | 64.00 MB      | tpch.customer |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+
+[localhost:21000] &gt; set mem_limit=15mb;
+MEM_LIMIT set to 15mb
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
+ERROR:
+Memory limit exceeded
+Query did not have enough memory to get the minimum required buffers in the block manager.
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_min.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_min.html b/docs/build/html/topics/impala_min.html
new file mode 100644
index 0000000..88a39ab
--- /dev/null
+++ b/docs/build/html/topics/impala_min.html
@@ -0,0 +1,297 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the minimum value from a set of numbers. Opposite of the
+      <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function
+      or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+      are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are
+      <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+        bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table.
+select min(c1) from t1;
+-- Find the smallest value for this column from a subset of the table.
+select min(c1) from t1 where month = 'January' and year = '2013';
+-- Find the smallest value from a set of numeric function results.
+select min(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, min(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select min(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | min |
++----+----------+-----+
+| 2  | even     | 2   |
+| 4  | even     | 2   |
+| 6  | even     | 2   |
+| 8  | even     | 2   |
+| 10 | even     | 2   |
+| 1  | odd      | 1   |
+| 3  | odd      | 1   |
+| 5  | odd      | 1   |
+| 7  | odd      | 1   |
+| 9  | odd      | 1   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code>
+result only decreases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point'
+  from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running minimum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local minimum |
++---+----------+---------------+
+| 7 | prime    | 5             |
+| 5 | prime    | 3             |
+| 3 | prime    | 2             |
+| 2 | prime    | 2             |
+| 9 | square   | 2             |
+| 4 | square   | 1             |
+| 1 | square   | 1             |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+      <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_misc_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_misc_functions.html b/docs/build/html/topics/impala_misc_functions.html
new file mode 100644
index 0000000..bf8c990
--- /dev/null
+++ b/docs/build/html/topics/impala_misc_functions.html
@@ -0,0 +1,175 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports the following utility functions that do not operate on a particular column or data type:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="misc_functions__current_database">
+          <code class="ph codeph">current_database()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code>
+          if no database has been selected, or whatever database the session switched to through a
+          <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__effective_user">
+          <code class="ph codeph">effective_user()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>,
+          except if delegation is enabled, in which case it returns the ID of the delegated user.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__pid">
+          <code class="ph codeph">pid()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is
+          connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the
+          arguments, and so on the <span class="keyword cmdname">impalad</span> process.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__user">
+          <code class="ph codeph">user()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span>
+          daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to
+          understand how authorization settings apply in a security context; once you know the logged-in username,
+          you can check which groups that user belongs to, and from the list of groups you can check which roles
+          are available to those groups through the authorization policy file.
+          <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+          <p class="p">
+            When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__uuid">
+          <code class="ph codeph">uuid()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used
+            as identifiers within a table, and as join keys across multiple tables.
+            The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not
+            require storing or querying the highest sequence number. For example, you
+            can use it to quickly construct new unique identifiers during a data import job,
+            or to combine data from different tables without the likelihood of ID collisions.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+-- Each call to uuid() produces a new arbitrary value.
+select uuid();
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| c7013e25-1455-457f-bf74-a2046e58caea |
++--------------------------------------+
+
+-- If you get a UUID for each row of a result set, you can use it as a
+-- unique identifier within a table, or even a unique ID across tables.
+select uuid() from four_row_table;
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| 51d3c540-85e5-4cb9-9110-604e53999e2e |
+| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 |
+| 5e9d7c36-9842-4a96-862d-c13cd0457c02 |
+| cae29095-0cc0-4053-a5ea-7fcd3c780861 |
++--------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__version">
+          <code class="ph codeph">version()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the
+          <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you
+          are connected to the expected level of Impala to use a particular feature, or to connect to several nodes
+          and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines)
+          </p>
+        </dd>
+
+      
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mixed_security.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mixed_security.html b/docs/build/html/topics/impala_mixed_security.html
new file mode 100644
index 0000000..b48fa8e
--- /dev/null
+++ b/docs/build/html/topics/impala_mixed_security.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each
+      <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No
+      special actions need to be taken if some users authenticate through Kerberos and some through LDAP.
+    </p>
+
+    <p class="p">
+      Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port
+      depending on the kind of authentication, then configure your network load balancer to forward each kind of
+      request to a DataNode that was set up with the appropriate authentication type. Once the initial request was
+      made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating
+      the work across multiple nodes and transmitting intermediate results back to the coordinator node.
+    </p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mt_dop.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mt_dop.html b/docs/build/html/topics/impala_mt_dop.html
new file mode 100644
index 0000000..adb4f0f
--- /dev/null
+++ b/docs/build/html/topics/impala_mt_dop.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Sets the degree of intra-node parallelism used for certain operations that
+      can benefit from multithreaded execution. You can specify values
+      higher than zero to find the ideal balance of response time,
+      memory usage, and CPU usage during statement processing.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The Impala execution engine is being revamped incrementally to add
+        additional parallelism within a single host for certain statements and
+        kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
+        <span class="q">"old"</span> code path with limited intra-node parallelism.
+      </p>
+
+      <p class="p">
+        Currently, the operations affected by the <code class="ph codeph">MT_DOP</code>
+        query option are:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
+            <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Queries with execution plans containing only scan and aggregation operators,
+            or local joins that do not need data exchanges (such as for nested types).
+            Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero
+            value. Therefore, this query option is typically only set for the duration of
+            specific long-running, CPU-intensive queries.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
+      </p>
+    <p class="p">
+      Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      statements for Parquet tables benefit substantially from extra intra-node
+      parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
+      for Parquet tables.
+    </p>
+    <p class="p">
+      <strong class="ph b">Range:</strong> 0 to 64
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Any timing figures in the following examples are on a small, lightly loaded development cluster.
+        Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
+        partitions within each table.
+      </p>
+    </div>
+
+    <p class="p">
+      The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
+      statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
+      setting:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Explicitly setting MT_DOP to 0 selects the old code path.
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- The analysis for the billion rows is distributed among hosts,
+-- but uses only a single core on each host.
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Using 4 logical processors per host is faster.
+set mt_dop = 4;
+MT_DOP set to 4
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Unsetting the option reverts back to its default.
+-- Which for COMPUTE STATS and a Parquet table is 4,
+-- so again it uses the fast path.
+unset MT_DOP;
+Unsetting option MT_DOP
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
+      for a query involving only scan and aggregation operations for a Parquet table:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- COUNT(DISTINCT) for a unique column is CPU-intensive.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 67.20s
+
+set mt_dop = 16;
+MT_DOP set to 16
+
+-- Introducing more intra-node parallelism for the aggregation
+-- speeds things up, and potentially reduces memory overhead by
+-- reducing the number of scanner threads.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 17.19s
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how queries that are not compatible with non-zero
+      <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
+      is set:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop=1;
+MT_DOP set to 1
+
+select * from a1 inner join a2
+  on a1.id = a2.id limit 4;
+ERROR: NotImplementedException: MT_DOP not supported for plans with
+  base table joins or table sinks.
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ndv.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ndv.html b/docs/build/html/topics/impala_ndv.html
new file mode 100644
index 0000000..d6c3117
--- /dev/null
+++ b/docs/build/html/topics/impala_ndv.html
@@ -0,0 +1,226 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT
+      <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the
+      combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and
+      thus is less memory-intensive for columns with high cardinality.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the
+      number of distinct values in a column.
+    </p>
+
+    <p class="p">
+      Because this number is an estimate, it might not reflect the precise number of different values in the
+      column, especially if the cardinality is very low or very high. If the estimated number is higher than the
+      number of rows in the table, Impala adjusts the value internally during query planning.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+        releases
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example queries a billion-row table to illustrate the relative performance of
+      <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code>
+      gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient.
+      The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster.
+    </p>
+
+<pre class="pre codeblock"><code>select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select cast(ndv(col1) as bigint) as col1 from sample_data;
++----------+
+| col1     |
++----------+
+| 139017   |
++----------+
+Fetched 1 row(s) in 8.91s
+</code></pre>
+
+    <p class="p">
+      The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to
+      easily learn which columns have substantially more or fewer distinct values. This technique is faster than
+      running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls.
+    </p>
+
+<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2,
+    cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4
+  from sample_data;
++----------+-----------+------------+-----------+
+| col1     | col2      | col3       | col4      |
++----------+-----------+------------+-----------+
+| 139017   | 282       | 46         | 145636240 |
++----------+-----------+------------+-----------+
+Fetched 1 row(s) in 34.97s
+
+select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select count(distinct col2) from sample_data;
++----------------------+
+| count(distinct col2) |
++----------------------+
+| 278                  |
++----------------------+
+Fetched 1 row(s) in 20.09s
+
+select count(distinct col3) from sample_data;
++-----------------------+
+| count(distinct col3)  |
++-----------------------+
+| 46                    |
++-----------------------+
+Fetched 1 row(s) in 19.12s
+
+select count(distinct col4) from sample_data;
++----------------------+
+| count(distinct col4) |
++----------------------+
+| 147135880            |
++----------------------+
+Fetched 1 row(s) in 266.95s
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file


Mime
View raw message