impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jruss...@apache.org
Subject [2/6] incubator-impala git commit: Add files that weren't needed during initial build testing of SQL Reference.
Date Mon, 31 Oct 2016 05:53:37 GMT
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
new file mode 100644
index 0000000..7b9ec2b
--- /dev/null
+++ b/docs/topics/impala_known_issues.xml
@@ -0,0 +1,1812 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="ver" id="known_issues">
+
+  <title><ph audience="standalone">Known Issues and Workarounds in Impala</ph><ph audience="integrated">Apache Impala (incubating) Known Issues</ph></title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Known Issues"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Upgrading"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
+      most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
+      whether a fix is in the pipeline.
+    </p>
+
+    <note>
+      The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
+      you are experiencing has already been reported, or which release an issue is fixed in, search on the
+      <xref href="https://issues.cloudera.org/" scope="external" format="html">issues.cloudera.org JIRA tracker</xref>.
+    </note>
+
+    <p outputclass="toc inpage"/>
+
+    <p>
+      For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
+    </p>
+
+<!-- Use as a template for new issues.
+    <concept id="">
+      <title></title>
+      <conbody>
+        <p>
+        </p>
+        <p><b>Bug:</b> <xref href="https://issues.cloudera.org/browse/" scope="external" format="html"></xref></p>
+        <p><b>Severity:</b> High</p>
+        <p><b>Resolution:</b> </p>
+        <p><b>Workaround:</b> </p>
+      </conbody>
+    </concept>
+
+-->
+
+  </conbody>
+
+<!-- New known issues for CDH 5.5 / Impala 2.3.
+
+Title: Server-to-server SSL and Kerberos do not work together
+Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2598
+Severity: Medium.  Server-to-server SSL is practically unusable but this is a new feature.
+Workaround: No known workaround.
+
+Title: Queries may hang on server-to-server exchange errors
+Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2592
+Severity: Low.  This does not occur frequently.
+Workaround: No known workaround.
+
+Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
+Description: Incremental stats use up about 400 bytes per partition X column.  So for a table with 20K partitions and 100 columns this is about 800 MB.  When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
+Severity: Low.  This does not occur frequently.
+Workaround:  Reduce the number of partitions.
+
+More from: https://issues.cloudera.org/browse/IMPALA-2093?filter=11278&jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC
+
+IMPALA-2093
+Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
+IMPALA-1652
+Incorrect results with basic predicate on CHAR typed column.
+IMPALA-1459
+Incorrect assignment of predicates through an outer join in an inline view.
+IMPALA-2665
+Incorrect assignment of On-clause predicate inside inline view with an outer join.
+IMPALA-2603
+Crash: impala::Coordinator::ValidateCollectionSlots
+IMPALA-2375
+Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
+IMPALA-1862
+Invalid bool value not reported as a scanner error
+IMPALA-1792
+ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
+IMPALA-1578
+Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
+IMPALA-2643
+Duplicated column in inline view causes dropping null slots during scan
+IMPALA-2005
+A failed CTAS does not drop the table if the insert fails.
+IMPALA-1821
+Casting scenarios with invalid/inconsistent results
+
+Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
+
+https://issues.cloudera.org/browse/IMPALA-2665 - Already have
+https://issues.cloudera.org/browse/IMPALA-2643 - Already have
+https://issues.cloudera.org/browse/IMPALA-1459 - Already have
+https://issues.cloudera.org/browse/IMPALA-2144 - Don't have
+
+-->
+
+  <concept id="known_issues_crash">
+
+    <title>Impala Known Issues: Crashes and Hangs</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause Impala to quit or become unresponsive.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3069" rev="IMPALA-3069">
+
+      <title>Setting BATCH_SIZE query option too large can cause a crash</title>
+
+      <conbody>
+
+        <p>
+          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
+          columns, could cause a memory allocation of more than 2 GB resulting in a crash.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3069" scope="external" format="html">IMPALA-3069</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3441" rev="IMPALA-3441">
+
+      <title></title>
+
+      <conbody>
+
+        <p>
+          Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3441" scope="external" format="html">IMPALA-3441</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.2 / Impala 2.6.2.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2592" rev="IMPALA-2592">
+
+      <title>Queries may hang on server-to-server exchange errors</title>
+
+      <conbody>
+
+        <p>
+          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
+          the other side of the channel to wait indefinitely, causing a hang.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2592" scope="external" format="html">IMPALA-2592</xref>
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2365" rev="IMPALA-2365">
+
+      <title>Impalad is crashing if udf jar is not available in hdfs location for first time</title>
+
+      <conbody>
+
+        <p>
+          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
+          issued, the <cmdname>impalad</cmdname> daemon crashes.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2365" scope="external" format="html">IMPALA-2365</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_performance">
+
+    <title id="ki_performance">Impala Known Issues: Performance</title>
+
+    <conbody>
+
+      <p>
+        These issues involve the performance of operations such as queries or DDL statements.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1480" rev="IMPALA-1480">
+
+<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which mentions it's similar to this one but not a duplicate. -->
+
+      <title>Slow DDL statements for tables with large number of partitions</title>
+
+      <conbody>
+
+        <p>
+          DDL statements for tables with a large number of partitions might be slow.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1480" scope="external" format="html"></xref>IMPALA-1480
+        </p>
+
+        <p>
+          <b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_usability">
+
+    <title id="ki_usability">Impala Known Issues: Usability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3133" rev="IMPALA-3133">
+
+      <title>Unexpected privileges in show output</title>
+
+      <conbody>
+
+        <p>
+          Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
+          sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
+          not represent a security issue for other statements.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3133" scope="external" format="html">IMPALA-3133</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixes have been issued for some but not all CDH / Impala releases. Check the JIRA for details of fix releases.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0 and CDH 5.7.1 / Impala 2.5.1.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1776" rev="IMPALA-1776">
+
+      <title>Less than 100% progress on completed simple SELECT queries</title>
+
+      <conbody>
+
+        <p>
+          Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1776" scope="external" format="html">IMPALA-1776</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="concept_lmx_dk5_lx">
+
+      <title>Unexpected column overflow behavior with INT datatypes</title>
+
+      <conbody>
+
+        <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
+
+        <p>
+          <b>Bug:</b>
+          <xref href="https://issues.cloudera.org/browse/IMPALA-3123"
+            scope="external" format="html">IMPALA-3123</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_drivers">
+
+    <title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title>
+
+    <conbody>
+
+      <p>
+        These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
+        in languages such as Java or C++.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1792" rev="IMPALA-1792">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</title>
+
+      <conbody>
+
+        <p>
+          If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
+          columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
+          <codeph>NULL</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1792" scope="external" format="html">IMPALA-1792</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Fetch columns in the same order they are defined in the table.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_security">
+
+    <title id="ki_security">Impala Known Issues: Security</title>
+
+    <conbody>
+
+      <p>
+        These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
+        redaction.
+      </p>
+
+    </conbody>
+
+<!-- To do: Hiding for the moment. https://jira.cloudera.com/browse/CDH-38736 reports the issue is fixed. -->
+
+    <concept id="impala-shell_ssl_dependency" audience="Cloudera" rev="impala-shell_ssl_dependency">
+
+      <title>impala-shell requires Python with ssl module</title>
+
+      <conbody>
+
+        <p>
+          On CentOS 5.10 and Oracle Linux 5.11 using the built-in Python 2.4, invoking the <cmdname>impala-shell</cmdname> with the
+          <codeph>--ssl</codeph> option might fail with the following error:
+        </p>
+
+<codeblock>
+Unable to import the python 'ssl' module. It is required for an SSL-secured connection.
+</codeblock>
+
+<!-- No associated IMPALA-* JIRA... It is the internal JIRA CDH-38736. -->
+
+        <p>
+          <b>Severity:</b> Low, workaround available
+        </p>
+
+        <p>
+          <b>Resolution:</b> Customers are less likely to experience this issue over time, because <codeph>ssl</codeph> module is included
+          in newer Python releases packaged with recent Linux releases.
+        </p>
+
+        <p>
+          <b>Workaround:</b> To use SSL with <cmdname>impala-shell</cmdname> on these platform versions, install the <codeph>ssh</codeph>
+          Python module:
+        </p>
+
+<codeblock>
+yum install python-ssl
+</codeblock>
+
+        <p>
+          Then <cmdname>impala-shell</cmdname> can run when using SSL. For example:
+        </p>
+
+<codeblock>
+impala-shell -s impala --ssl --ca_cert /path_to_truststore/truststore.pem
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="renewable_kerberos_tickets">
+
+<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. -->
+
+      <title>Kerberos tickets must be renewable</title>
+
+      <conbody>
+
+        <p>
+          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
+          renewable tickets.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix into "known issues now fixed" convention.
+     That set of fix releases looks incomplete so probably have to do some detective work with the JIRA.
+     https://issues.cloudera.org/browse/IMPALA-2598
+    <concept id="IMPALA-2598" rev="IMPALA-2598">
+
+      <title>Server-to-server SSL and Kerberos do not work together</title>
+
+      <conbody>
+
+        <p>
+          If SSL is enabled between internal Impala components (with <codeph>ssl_client_ca_certificate</codeph>), and Kerberos
+          authentication is used between servers, the cluster fails to start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2598" scope="external" format="html">IMPALA-2598</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Do not use the new <codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters until this
+          issue is resolved.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.2 / Impala 2.3.2.</p>
+
+      </conbody>
+
+    </concept>
+-->
+
+  </concept>
+
+<!--
+  <concept id="known_issues_supportability">
+
+    <title id="ki_supportability">Impala Known Issues: Supportability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
+        shown in monitoring applications.
+      </p>
+
+    </conbody>
+
+  </concept>
+-->
+
+  <concept id="known_issues_resources">
+
+    <title id="ki_resources">Impala Known Issues: Resources</title>
+
+    <conbody>
+
+      <p>
+        These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
+        features.
+      </p>
+
+    </conbody>
+
+    <concept id="TSB-168">
+
+      <title>Impala catalogd heap issues when upgrading to 5.7</title>
+
+      <conbody>
+
+        <p>
+          The default heap size for Impala <cmdname>catalogd</cmdname> has changed in CDH 5.7 / Impala 2.5 and higher:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              Before 5.7, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
+              physical memory or 32 GB.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Starting with CDH 5.7.0, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
+          in out-of-memory errors in catalogd and leading to query failures.
+        </p>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref href="https://jira.cloudera.com/browse/TSB-168" scope="external" format="html">TSB-168</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
+<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
+<!-- Including full details here via conref, for benefit of PDF readers or anyone else
+             who might have trouble seeing or following the link. -->
+        </p>
+
+        <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3509" rev="IMPALA-3509">
+
+      <title>Breakpad minidumps can be very large when the thread count is high</title>
+
+      <conbody>
+
+        <p>
+          The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
+          minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3509" scope="external" format="html">IMPALA-3509</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
+          size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
+          from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
+          file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
+          than 20 MB.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3662" rev="IMPALA-3662">
+
+      <title>Parquet scanner memory increase after IMPALA-2736</title>
+
+      <conbody>
+
+        <p>
+          The initial release of CDH 5.8 / Impala 2.6 sometimes has a higher peak memory usage than in previous releases while reading
+          Parquet files.
+        </p>
+
+        <p>
+          CDH 5.8 / Impala 2.6 addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
+          may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
+          materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
+          <ul>
+            <li>
+              <p>
+                Very wide rows due to projecting many columns in a scan.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Very large rows due to big column values, for example, long strings or nested collections with many items.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
+                plan nodes.
+              </p>
+            </li>
+          </ul>
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3662" scope="external" format="html">IMPALA-3662</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
+          <ul>
+            <li>
+              Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
+            </li>
+
+            <li>
+              Reduce the batch size, for example: <codeph>set batch_size=512</codeph>
+            </li>
+
+            <li>
+              Increase the memory limit, for example: <codeph>set mem_limit=64g</codeph>
+            </li>
+          </ul>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-691" rev="IMPALA-691">
+
+      <title>Process mem limit does not account for the JVM's memory usage</title>
+
+<!-- Supposed to be resolved for Impala 2.3.0. -->
+
+      <conbody>
+
+        <p>
+          Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
+          <cmdname>impalad</cmdname> daemon.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-691" scope="external" format="html">IMPALA-691</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
+          Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2375" rev="IMPALA-2375">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2375" scope="external" format="html">IMPALA-2375</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_correctness">
+
+    <title id="ki_correctness">Impala Known Issues: Correctness</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3084" rev="IMPALA-3084">
+
+      <title>Incorrect assignment of NULL checking predicate through an outer join of a nested collection.</title>
+
+      <conbody>
+
+        <p>
+          A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
+          collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+          <codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> clause.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3084" scope="external" format="html">IMPALA-3084</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3094" rev="IMPALA-3094">
+
+      <title>Incorrect result due to constant evaluation in query with outer join</title>
+
+      <conbody>
+
+        <p>
+          An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
+          another join clause. For example:
+        </p>
+
+<codeblock><![CDATA[
+explain SELECT 1 FROM alltypestiny a1
+  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+|                                                         |
+| 00:EMPTYSET                                             |
++---------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3094" scope="external" format="html">IMPALA-3094</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b>
+        </p>
+
+        <p>
+          <b>Workaround:</b>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3126" rev="IMPALA-3126">
+
+      <title>Incorrect assignment of an inner join On-clause predicate through an outer join.</title>
+
+      <conbody>
+
+        <p>
+          Impala may return incorrect results for queries that have the following properties:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              There is an INNER JOIN following a series of OUTER JOINs.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
+              preceding OUTER JOINs.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          The following query demonstrates the issue:
+        </p>
+
+<codeblock>
+select 1 from functional.alltypes a left outer join
+  functional.alltypes b on a.id = b.id left outer join
+  functional.alltypes c on b.id = c.id right outer join
+  functional.alltypes d on c.id = d.id inner join functional.alltypes e
+on b.int_col = c.int_col;
+</codeblock>
+
+        <p>
+          The following listing shows the incorrect <codeph>EXPLAIN</codeph> plan:
+        </p>
+
+<codeblock><![CDATA[
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |  other predicates: b.int_col = c.int_col     <--- incorrect placement; should be at node 07 or 08
+| |  runtime filters: RF001 <- c.int_col                    |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF002 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.int_col, RF002 -> b.id     |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3126" scope="external" format="html">IMPALA-3126</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> High
+        </p>
+
+        <p>
+          For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
+          <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
+          the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
+        </p>
+
+<codeblock><![CDATA[
+select 1 from functional.alltypes a
+  left outer join functional.alltypes b
+    on a.id = b.id
+  left outer join functional.alltypes c
+    on b.id = c.id
+  right outer join functional.alltypes d
+    on c.id = d.id
+  inner join functional.alltypes e
+where b.int_col = c.int_col
+
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  other predicates: b.int_col = c.int_col          <-- correct assignment
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF001 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.id                         |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3006" rev="IMPALA-3006">
+
+      <title>Impala may use incorrect bit order with BIT_PACKED encoding</title>
+
+      <conbody>
+
+        <p>
+          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3006" scope="external" format="html">IMPALA-3006</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
+          in Parquet 2.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3082" rev="IMPALA-3082">
+
+      <title>BST between 1972 and 1995</title>
+
+      <conbody>
+
+        <p>
+          The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
+          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
+          as:
+        </p>
+
+<codeblock>
+select
+  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
+  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3082" scope="external" format="html">IMPALA-3082</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1170" rev="IMPALA-1170">
+
+      <title>parse_url() returns incorrect result if @ character in URL</title>
+
+      <conbody>
+
+        <p>
+          If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
+          the hostname field.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1170" scope="external" format="html"></xref>IMPALA-1170
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 / Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2422" rev="IMPALA-2422">
+
+      <title>% escaping does not work correctly when occurs at the end in a LIKE clause</title>
+
+      <conbody>
+
+        <p>
+          If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
+          does not match a <codeph>%</codeph> final character of the LHS argument.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2422" scope="external" format="html">IMPALA-2422</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-397" rev="IMPALA-397">
+
+      <title>ORDER BY rand() does not work.</title>
+
+      <conbody>
+
+        <p>
+          Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
+          involving a call to <codeph>rand()</codeph> does not actually randomize the results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-397" scope="external" format="html">IMPALA-397</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2643" rev="IMPALA-2643">
+
+      <title>Duplicated column in inline view causes dropping null slots during scan</title>
+
+      <conbody>
+
+        <p>
+          If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
+          result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2643" scope="external" format="html">IMPALA-2643</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Avoid selecting the same column twice within an inline view.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.10 / Impala 2.2.10.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1459" rev="IMPALA-1459">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Incorrect assignment of predicates through an outer join in an inline view.</title>
+
+      <conbody>
+
+        <p>
+          A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
+          from the <codeph>ON</codeph> clause incorrectly.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1459" scope="external" format="html">IMPALA-1459</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2603" rev="IMPALA-2603">
+
+      <title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
+
+      <conbody>
+
+        <p>
+          A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
+          subqueries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2603" scope="external" format="html">IMPALA-2603</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2665" rev="IMPALA-2665">
+
+      <title>Incorrect assignment of On-clause predicate inside inline view with an outer join.</title>
+
+      <conbody>
+
+        <p>
+          A query might return incorrect results due to wrong predicate assignment in the following scenario:
+        </p>
+
+        <ol>
+          <li>
+            There is an inline view that contains an outer join
+          </li>
+
+          <li>
+            That inline view is joined with another table in the enclosing query block
+          </li>
+
+          <li>
+            That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
+            the inline view
+          </li>
+        </ol>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2665" scope="external" format="html">IMPALA-2665</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2144" rev="IMPALA-2144">
+
+      <title>Wrong assignment of having clause predicate across outer join</title>
+
+      <conbody>
+
+        <p>
+          In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
+          clause might be applied at the wrong stage of query processing, leading to incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2144" scope="external" format="html">IMPALA-2144</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2093" rev="IMPALA-2093">
+
+      <title>Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</title>
+
+      <conbody>
+
+        <p>
+          A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
+          SUM(...))</codeph>, could return incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2093" scope="external" format="html">IMPALA-2093</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 / Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_metadata">
+
+    <title id="ki_metadata">Impala Known Issues: Metadata</title>
+
+    <conbody>
+
+      <p>
+        These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
+        STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2648" rev="IMPALA-2648">
+
+      <title>Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats</title>
+
+      <conbody>
+
+        <p>
+          Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
+          columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
+          this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
+        </p>
+
+        <p>
+          <b>Bugs:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2647" scope="external" format="html">IMPALA-2647</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2648" scope="external" format="html">IMPALA-2648</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2649" scope="external" format="html">IMPALA-2649</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
+          scalability of incremental stats computation is a continuing work item.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Can't update stats manually via alter table after upgrading to CDH 5.2</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1420" scope="external" format="html">IMPALA-1420</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> On CDH 5.2, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
+          enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
+          set both properties with a single <codeph>ALTER TABLE</codeph> statement:
+        </p>
+
+<codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
+
+        <p>
+          <b>Resolution:</b> The underlying cause is the issue
+          <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
+          metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into a CDH release.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_interop">
+
+    <title id="ki_interop">Impala Known Issues: Interoperability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
+        and file formats.
+      </p>
+
+    </conbody>
+
+<!-- Opened based on CDH-41605. Not part of Alex's spreadsheet AFAIK. -->
+
+    <concept id="CDH-41605">
+
+      <title>DESCRIBE FORMATTED gives error on Avro table</title>
+
+      <conbody>
+
+        <p>
+          This issue can occur either on old Avro tables (created prior to Hive 1.1 / CDH 5.4) or when changing the Avro schema file by
+          adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
+          FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
+        </p>
+
+        <p>
+          As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
+          the Hive metastore database with the correct column definitions.
+        </p>
+
+        <note type="warning">
+          Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first:
+<codeblock>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</codeblock>
+          (The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the
+          table. See <xref href="impala_tables.xml#tables"/> for the differences between internal and external tables.
+        </note>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/CDH-41605" scope="external" format="html">CDH-41605</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-469">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.</title>
+
+      <conbody>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref href="https://jira.cloudera.com/browse/IMP-469" scope="external" format="html"/>; KI added 0.1
+          <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution</b>: None
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use explicit casts.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-175">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</title>
+
+      <conbody>
+
+        <p>
+          Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
+          allowed value of type (Hive returns NULL).
+        </p>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref href="https://jira.cloudera.com/browse/IMP-175" scope="external" format="html">IMPALA-175</xref> ; KI
+          added 0.1 <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Workaround:</b> None
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="CDH-13199">
+
+<!-- Not part of Alex's spreadsheet. The CDH- prefix makes it an oddball. -->
+
+      <title>Configuration needed for Flume to be compatible with Impala</title>
+
+      <conbody>
+
+        <p>
+          For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
+          <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
+          must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
+          Impala or Hive.
+        </p>
+
+        <p>
+          <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-635" rev="IMPALA-635">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Avro Scanner fails to parse some schemas</title>
+
+      <conbody>
+
+        <p>
+          Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-635" scope="external" format="html">IMPALA-635</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
+          instead of <codeph>["string", "null"]</codeph>.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
+          crashing issue is resolved.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1024" rev="IMPALA-1024">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala BE cannot parse Avro schema that contains a trailing semi-colon</title>
+
+      <conbody>
+
+        <p>
+          If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1024" scope="external" format="html">IMPALA-1024</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> Remove trailing semicolon from the Avro schema.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2154" rev="IMPALA-2154">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix decompressor to allow parsing gzips with multiple streams</title>
+
+      <conbody>
+
+        <p>
+          Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
+          streams, the Impala query only processes the data from the first stream.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2154" scope="external" format="html">IMPALA-2154</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1578" rev="IMPALA-1578">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block</title>
+
+      <conbody>
+
+        <p>
+          If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
+          the row following the <codeph>\n\r</codeph> pair twice.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1578" scope="external" format="html">IMPALA-1578</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1862" rev="IMPALA-1862">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Invalid bool value not reported as a scanner error</title>
+
+      <conbody>
+
+        <p>
+          In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value.
+          The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to
+          overlooking the presence of invalid data.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1862" scope="external" format="html">IMPALA-1862</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1652" rev="IMPALA-1652">
+
+<!-- To do: Isn't this more a correctness issue? -->
+
+      <title>Incorrect results with basic predicate on CHAR typed column.</title>
+
+      <conbody>
+
+        <p>
+          When comparing a <codeph>CHAR</codeph> column value to a string literal, the literal value is not blank-padded and so the
+          comparison might fail when it should match.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1652" scope="external" format="html">IMPALA-1652</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to blank-pad literals compared with <codeph>CHAR</codeph> columns to
+          the expected length.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_limitations">
+
+    <title>Impala Known Issues: Limitations</title>
+
+    <conbody>
+
+      <p>
+        These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management
+        workflow.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-77" rev="IMPALA-77">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
+
+      <title>Impala does not support running on clusters with federated namespaces</title>
+
+      <conbody>
+
+        <p>
+          Impala does not support running on clusters with federated namespaces. The <codeph>impalad</codeph> process will not start on a
+          node running such a filesystem based on the <codeph>org.apache.hadoop.fs.viewfs.ViewFs</codeph> class.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-77" scope="external" format="html">IMPALA-77</xref>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution:</b> Limitation
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use standard HDFS on all Impala nodes.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_misc">
+
+    <title>Impala Known Issues: Miscellaneous / Older Issues</title>
+
+    <conbody>
+
+      <p>
+        These issues do not fall into one of the above categories or have not been categorized yet.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2005" rev="IMPALA-2005">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>A failed CTAS does not drop the table if the insert fails.</title>
+
+      <conbody>
+
+        <p>
+          If a <codeph>CREATE TABLE AS SELECT</codeph> operation successfully creates the target table but an error occurs while querying
+          the source table or copying the data, the new table is left behind rather than being dropped.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2005" scope="external" format="html">IMPALA-2005</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Drop the new table manually after a failed <codeph>CREATE TABLE AS SELECT</codeph>.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1821" rev="IMPALA-1821">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Casting scenarios with invalid/inconsistent results</title>
+
+      <conbody>
+
+        <p>
+          Using a <codeph>CAST()</codeph> function to convert large literal values to smaller types, or to convert special values such as
+          <codeph>NaN</codeph> or <codeph>Inf</codeph>, produces values not consistent with other database systems. This could lead to
+          unexpected results from queries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1821" scope="external" format="html">IMPALA-1821</xref>
+        </p>
+
+<!-- <p><b>Workaround:</b> Doublecheck that <codeph>CAST()</codeph> operations work as expect. The issue applies to expressions involving literals, not values read from table columns.</p> -->
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1619" rev="IMPALA-1619">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Support individual memory allocations larger than 1 GB</title>
+
+      <conbody>
+
+        <p>
+          The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could
+          crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as
+          <codeph>group_concat()</codeph> returned a value greater than 1 GiB.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1619" scope="external" format="html">IMPALA-1619</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.3 / Impala 2.6.3.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-941" rev="IMPALA-941">
+
+<!-- Not part of Alex's spreadsheet. Maybe this is interop? -->
+
+      <title>Impala Parser issue when using fully qualified table names that start with a number.</title>
+
+      <conbody>
+
+        <p>
+          A fully qualified table name starting with a number could cause a parsing error. In a name such as <codeph>db.571_market</codeph>,
+          the decimal point followed by digits is interpreted as a floating-point number.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-941" scope="external" format="html">IMPALA-941</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Surround each part of the fully qualified name with backticks (<codeph>``</codeph>).
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-532" rev="IMPALA-532">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
+
+      <title>Impala should tolerate bad locale settings</title>
+
+      <conbody>
+
+        <p>
+          If the <codeph>LC_*</codeph> environment variables specify an unsupported locale, Impala does not start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-532" scope="external" format="html">IMPALA-532</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add <codeph>LC_ALL="C"</codeph> to the environment settings for both the Impala daemon and the Statestore
+          daemon. See <xref href="impala_config_options.xml#config_options"/> for details about modifying these environment settings.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-1203">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
+
+      <title>Log Level 3 Not Recommended for Impala</title>
+
+      <conbody>
+
+        <p>
+          The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Reduce the log level to its default value of 1, that is, <codeph>GLOG_v=1</codeph>. See
+          <xref href="impala_logging.xml#log_levels"/> for details about the effects of setting different logging levels.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_max_block_mgr_memory.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_max_block_mgr_memory.xml b/docs/topics/impala_max_block_mgr_memory.xml
new file mode 100644
index 0000000..3bf8ac8
--- /dev/null
+++ b/docs/topics/impala_max_block_mgr_memory.xml
@@ -0,0 +1,30 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.1.0" id="max_block_mgr_memory">
+
+  <title>MAX_BLOCK_MGR_MEMORY</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Memory"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.1.0">
+      <indexterm audience="Cloudera">MAX_BLOCK_MGR_MEMORY query option</indexterm>
+    </p>
+
+    <p></p>
+
+    <p>
+      <b>Default:</b>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_max_num_runtime_filters.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_max_num_runtime_filters.xml b/docs/topics/impala_max_num_runtime_filters.xml
new file mode 100644
index 0000000..90e91dc
--- /dev/null
+++ b/docs/topics/impala_max_num_runtime_filters.xml
@@ -0,0 +1,61 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="max_num_runtime_filters" rev="2.5.0">
+
+  <title>MAX_NUM_RUNTIME_FILTERS Query Option (CDH 5.7 or higher only)</title>
+  <titlealts audience="PDF"><navtitle>MAX_NUM_RUNTIME_FILTERS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.5.0">
+      <indexterm audience="Cloudera">MAX_NUM_RUNTIME_FILTERS query option</indexterm>
+      The <codeph>MAX_NUM_RUNTIME_FILTERS</codeph> query option
+      sets an upper limit on the number of runtime filters that can be produced for each query.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_integer"/>
+
+    <p>
+      <b>Default:</b> 10
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_250"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Each runtime filter imposes some memory overhead on the query.
+      Depending on the setting of the <codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph>
+      query option, each filter might consume between 1 and 16 megabytes
+      per plan fragment. There are typically 5 or fewer filters per plan fragment.
+    </p>
+
+    <p>
+      Impala evaluates the effectiveness of each filter, and keeps the
+      ones that eliminate the largest number of partitions or rows.
+      Therefore, this setting can protect against
+      potential problems due to excessive memory overhead for filter production,
+      while still allowing a high level of optimization for suitable queries.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/runtime_filtering_option_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_runtime_filtering.xml"/>,
+      <!-- <xref href="impala_partitioning.xml#dynamic_partition_pruning"/>, -->
+      <xref href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size"/>,
+      <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_optimize_partition_key_scans.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_optimize_partition_key_scans.xml b/docs/topics/impala_optimize_partition_key_scans.xml
new file mode 100644
index 0000000..60635ff
--- /dev/null
+++ b/docs/topics/impala_optimize_partition_key_scans.xml
@@ -0,0 +1,180 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.5.0 IMPALA-2499" id="optimize_partition_key_scans">
+
+  <title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (CDH 5.7 or higher only)</title>
+  <titlealts audience="PDF"><navtitle>OPTIMIZE_PARTITION_KEY_SCANS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.5.0 IMPALA-2499">
+      <indexterm audience="Cloudera">OPTIMIZE_PARTITION_KEY_SCANS query option</indexterm>
+      Enables a fast code path for queries that apply simple aggregate functions to partition key
+      columns: <codeph>MIN(<varname>key_column</varname>)</codeph>, <codeph>MAX(<varname>key_column</varname>)</codeph>,
+      or <codeph>COUNT(DISTINCT <varname>key_column</varname>)</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <note conref="../shared/impala_common.xml#common/one_but_not_true"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_250"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      This optimization speeds up common <q>introspection</q> operations when using queries
+      to calculate the cardinality and range for partition key columns.
+    </p>
+
+    <p>
+      This optimization does not apply if the queries contain any <codeph>WHERE</codeph>,
+      <codeph>GROUP BY</codeph>, or <codeph>HAVING</codeph> clause. The relevant queries
+      should only compute the minimum, maximum, or number of distinct values for the
+      partition key columns across the whole table.
+    </p>
+
+    <p>
+      This optimization is enabled by a query option because it skips some consistency checks
+      and therefore can return slightly different partition values if partitions are in the
+      process of being added, dropped, or loaded outside of Impala. Queries might exhibit different
+      behavior depending on the setting of this option in the following cases:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          If files are removed from a partition using HDFS or other non-Impala operations,
+          there is a period until the next <codeph>REFRESH</codeph> of the table where regular
+          queries fail at run time because they detect the missing files. With this optimization
+          enabled, queries that evaluate only the partition key column values (not the contents of
+          the partition itself) succeed, and treat the partition as if it still exists.
+        </p>
+      </li>
+      <li>
+        <p>
+          If a partition contains any data files, but the data files do not contain any rows,
+          a regular query considers that the partition does not exist. With this optimization
+          enabled, the partition is treated as if it exists.
+        </p>
+        <p>
+          If the partition includes no files at all, this optimization does not change the query
+          behavior: the partition is considered to not exist whether or not this optimization is enabled.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows initial schema setup and the default behavior of queries that
+      return just the partition key column for a table:
+    </p>
+
+<codeblock>
+-- Make a partitioned table with 3 partitions.
+create table t1 (s string) partitioned by (year int);
+insert into t1 partition (year=2015) values ('last year');
+insert into t1 partition (year=2016) values ('this year');
+insert into t1 partition (year=2017) values ('next year');
+
+-- Regardless of the option setting, this query must read the
+-- data files to know how many rows to return for each year value.
+explain select year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   00:SCAN HDFS [key_cols.t1]                        |
+|      partitions=3/3 files=4 size=40B                |
+|      table stats: 3 rows total                      |
+|      column stats: all                              |
+|      hosts=3 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The aggregation operation means the query does not need to read
+-- the data within each partition: the result set contains exactly 1 row
+-- per partition, derived from the partition key column value.
+-- By default, Impala still includes a 'scan' operation in the query.
+explain select distinct year from t1;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0                                |
+|                                                                                    |
+| 01:AGGREGATE [FINALIZE]                                                            |
+| |  group by: year                                                                  |
+| |                                                                                  |
+| 00:SCAN HDFS [key_cols.t1]                                                         |
+|    partitions=0/0 files=0 size=0B                                                  |
++------------------------------------------------------------------------------------+
+</codeblock>
+
+    <p>
+      The following examples show how the plan is made more efficient when the
+      <codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph> option is enabled:
+    </p>
+
+<codeblock>
+set optimize_partition_key_scans=1;
+OPTIMIZE_PARTITION_KEY_SCANS set to 1
+
+-- The aggregation operation is turned into a UNION internally,
+-- with constant values known in advance based on the metadata
+-- for the partitioned table.
+explain select distinct year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  group by: year                                 |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=3          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The same optimization applies to other aggregation queries
+-- that only return values based on partition key columns:
+-- MIN, MAX, COUNT(DISTINCT), and so on.
+explain select min(year) from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  output: min(year)                              |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=1          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+</codeblock>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_parquet_annotate_strings_utf8.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_parquet_annotate_strings_utf8.xml b/docs/topics/impala_parquet_annotate_strings_utf8.xml
new file mode 100644
index 0000000..cd5b578
--- /dev/null
+++ b/docs/topics/impala_parquet_annotate_strings_utf8.xml
@@ -0,0 +1,50 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="parquet_annotate_strings_utf8" rev="2.6.0 IMPALA-2069">
+
+  <title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (CDH 5.8 or higher only)</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-2069">
+      <indexterm audience="Cloudera">PARQUET_ANNOTATE_STRINGS_UTF8 query option</indexterm>
+      Causes Impala <codeph>INSERT</codeph> and <codeph>CREATE TABLE AS SELECT</codeph> statements
+      to write Parquet files that use the UTF-8 annotation for <codeph>STRING</codeph> columns.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      By default, Impala represents a <codeph>STRING</codeph> column in Parquet as an unannotated binary field.
+    </p>
+    <p>
+      Impala always uses the UTF-8 annotation when writing <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph>
+      columns to Parquet files. An alternative to using the query option is to cast <codeph>STRING</codeph>
+      values to <codeph>VARCHAR</codeph>.
+    </p>
+    <p>
+      This option is to help make Impala-written data more interoperable with other data processing engines.
+      Impala itself currently does not support all operations on UTF-8 data.
+      Although data processed by Impala is typically represented in ASCII, it is valid to designate the
+      data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
+    </p>
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_parquet.xml#parquet"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_parquet_fallback_schema_resolution.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_parquet_fallback_schema_resolution.xml b/docs/topics/impala_parquet_fallback_schema_resolution.xml
new file mode 100644
index 0000000..06b1a28
--- /dev/null
+++ b/docs/topics/impala_parquet_fallback_schema_resolution.xml
@@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="parquet_fallback_schema_resolution" rev="2.6.0 IMPALA-2835 CDH-33330">
+
+  <title>PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (CDH 5.8 or higher only)</title>
+  <titlealts audience="PDF"><navtitle>PARQUET_FALLBACK_SCHEMA_RESOLUTION</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-2835 CDH-33330">
+      <indexterm audience="Cloudera">PARQUET_FALLBACK_SCHEMA_RESOLUTION query option</indexterm>
+      Allows Impala to look up columns within Parquet files by column name, rather than column order,
+      when necessary.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      By default, Impala looks up columns within a Parquet file based on
+      the order of columns in the table.
+      The <codeph>name</codeph> setting for this option enables behavior for Impala queries
+      similar to the Hive setting <codeph>parquet.column.index.access=false</codeph>.
+      It also allows Impala to query Parquet files created by Hive with the
+      <codeph>parquet.column.index.access=false</codeph> setting in effect.
+    </p>
+
+    <p>
+      <b>Type:</b> integer or string.
+      Allowed values are 0 or <codeph>position</codeph> (default), 1 or <codeph>name</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_parquet.xml#parquet_schema_evolution"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_perf_ddl.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_ddl.xml b/docs/topics/impala_perf_ddl.xml
new file mode 100644
index 0000000..d075cd2
--- /dev/null
+++ b/docs/topics/impala_perf_ddl.xml
@@ -0,0 +1,42 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="perf_ddl">
+
+  <title>Performance Considerations for DDL Statements</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      These tips and guidelines apply to the Impala DDL statements, which are listed in
+      <xref href="impala_ddl.xml#ddl"/>.
+    </p>
+
+    <p>
+      Because Impala DDL statements operate on the metastore database, the performance considerations for those
+      statements are totally different than for distributed queries that operate on HDFS
+      <ph rev="2.2.0">or S3</ph> data files, or on HBase tables.
+    </p>
+
+    <p>
+      Each DDL statement makes a relatively small update to the metastore database. The overhead for each statement
+      is proportional to the overall number of Impala and Hive tables, and (for a partitioned table) to the overall
+      number of partitions in that table. Issuing large numbers of DDL statements (such as one for each table or
+      one for each partition) also has the potential to encounter a bottleneck with access to the metastore
+      database. Therefore, for efficient DDL, try to design your application logic and ETL pipeline to avoid a huge
+      number of tables and a huge number of partitions within each table. In this context, <q>huge</q> is in the
+      range of tens of thousands or hundreds of thousands.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/add_partition_set_location"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_prefetch_mode.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_prefetch_mode.xml b/docs/topics/impala_prefetch_mode.xml
new file mode 100644
index 0000000..30dd116
--- /dev/null
+++ b/docs/topics/impala_prefetch_mode.xml
@@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="prefetch_mode" rev="2.6.0 IMPALA-3286">
+
+  <title>PREFETCH_MODE Query Option (CDH 5.8 or higher only)</title>
+  <titlealts audience="PDF"><navtitle>PREFETCH_MODE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-3286">
+      <indexterm audience="Cloudera">PREFETCH_MODE query option</indexterm>
+      Determines whether the prefetching optimization is applied during
+      join query processing.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric (0, 1)
+      or corresponding mnemonic strings (<codeph>NONE</codeph>, <codeph>HT_BUCKET</codeph>).
+    </p>
+
+    <p>
+      <b>Default:</b> 1 (equivalent to <codeph>HT_BUCKET</codeph>)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      The default mode is 1, which means that hash table buckets are
+      prefetched during join query processing.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_joins.xml#joins"/>,
+      <xref href="impala_perf_joins.xml#perf_joins"/>.
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_query_lifetime.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_lifetime.xml b/docs/topics/impala_query_lifetime.xml
new file mode 100644
index 0000000..2f46d21
--- /dev/null
+++ b/docs/topics/impala_query_lifetime.xml
@@ -0,0 +1,31 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_lifetime">
+
+  <title>Impala Query Lifetime</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala queries progress through a series of stages from the time they are initiated to the time
+      they are completed. A query can also be cancelled before it is entirely finished, either
+      because of an explicit cancellation, or because of a timeout, out-of-memory, or other error condition.
+      Understanding the query lifecycle can help you manage the throughput and resource usage of Impala
+      queries, especially in a high-concurrency or multi-workload environment.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_relnotes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_relnotes.xml b/docs/topics/impala_relnotes.xml
new file mode 100644
index 0000000..5c53a21
--- /dev/null
+++ b/docs/topics/impala_relnotes.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="relnotes" audience="standalone">
+
+  <title>Impala Release Notes</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody id="relnotes_intro">
+
+    <p>
+      These release notes provide information on the <xref href="impala_new_features.xml#new_features">new
+      features</xref> and <xref href="impala_known_issues.xml#known_issues">known issues and limitations</xref> for
+      Impala versions up to <ph conref="../shared/ImpalaVariables.xml#impala_vars/ReleaseVersion"/>. For users
+      upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+      Cloudera software, <xref href="impala_incompatible_changes.xml#incompatible_changes"/> lists any changes to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p>
+      Once you are finished reviewing these release notes, for more information about using Impala, see
+      <xref audience="integrated" href="impala.xml"/><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html" scope="external" format="html"/>.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>



Mime
View raw message