impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Russell (Code Review)" <>
Subject [Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Date Mon, 30 Jan 2017 17:59:57 GMT
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page

Patch Set 7:


Addressed all comments, esp. splitting CREATE TABLE syntax for HDFS vs. Kudu.
File docs/topics/impala_explain.xml:

PS7, Line 250:  
> "in a SCAN KUDU node"

PS7, Line 251: , and might involve transmitting
             :       non-matching rows that are filtered out on the Impala side.
> remove
File docs/topics/impala_kudu.xml:

PS7, Line 76: scan performance close to that of Parquet
> Make sure you check with Mostafa about this claim.

PS7, Line 147: The work is parallelized
             :               across units of computing called <term>tablet servers</term>.
> I believe the unit of computing is the tablet not the tablet server, unless
The tablet server is the compute resource, the tablet holds the actual data.

PS7, Line 149: between the tablets and tablet servers.
> I think it would be nice to mention that the user is not responsible for ma

> It is weird to mention CREATE without DROP. Please remove this and mention 
Since this section is only talking about the delta in syntax, I'll leave as-is.

PS7, Line 397: <p>
             :             For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
             :             values, because it is not practical to enforce a <q>not null</q>
constraint on HDFS
             :             data files that could be prepared using external tools and ETL
             :           </p>
             :           <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
             :           <p>
             :             For example, a table containing geographic information might require
the latitude
             :             and longitude coordinates to always be specified. Other attributes
might be allowed
             :             to be <codeph>NULL</codeph>. For example, a location
might not have a designated
             :             place name, its altitude might be unimportant, and its population
might be initially
             :             unknown, to be filled in later.
             :           </p>
> You may want to swap these two paragraphs.

> Impala still uses "PARTITIONED BY" for HDFS tables.

PS7, Line 727: . By setting
             :           up an effective partitioning scheme for a Kudu table, you can ensure
that the work for
             :           a query can be parallelized evenly across the hosts in a cluster.
> Remove. Sometimes the goal is to scan as little as possible. You can say th
Whole paragraph was hidden per comment on a previous patch set. (Instead, we link people to
the Kudu white paper for those kinds of details.)

PS7, Line 936: To see the distribution of data in a Kudu table across the underlying buckets
             :             partitions, use the <codeph>SHOW TABLE STATS</codeph>
> This is unfortunately not accurate. SHOW TABLE STATS will only show the dis
Done. Reworded rather than deleting entirely.

PS7, Line 1122: change
> "changes"

PS7, Line 1159: strong consistency for order of operations
> I am not sure I know what that means.

PS7, Line 1159: total
              :         success or total failure of a multi-row statement
> This is "atomicity". Maybe just mention atomic multi-row statements.

PS7, Line 1160: or data that is read while a write
              :         operation is in progress
> Isolation.
I added the technical terminology into my wording.

PS7, Line 1288: <title>Memory Usage for Operations on Kudu Tables</title>
              :       <conbody>
              :         <p>
              :           The Apache Kudu architecture, topology, and data storage techniques
result in
              :           different patterns of memory usage for Impala statements than with
HDFS-backed tables.
              :         </p>
> I don't find this particularly informative and suggest we remove it unless 
Yes, audience="hidden" earlier in the <concept> tag will make it invisible.
File docs/topics/impala_literals.xml:

PS7, Line 408: most Impala tables
> Impala tables backed by HDFS or S3? "most" is kind of vague
File docs/topics/impala_revoke.xml:

PS7, Line 115: access to a Kudu table is <q>all or nothing</q>.
> "only table-level permissions are enforced in Kudu tables. Column-level per
File docs/topics/impala_tables.xml:

PS7, Line 293: using the Apache Kudu storage system
> "stored in Apache Kudu"

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <>
Gerrit-Reviewer: Ambreen Kazi <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Jean-Daniel Cryans <>
Gerrit-Reviewer: John Russell <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

View raw message