impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Russell (Code Review)" <>
Subject [Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Date Tue, 24 Jan 2017 19:35:48 GMT
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page

Patch Set 8:


Addressed some early comments from Dimitris and Todd that I skipped over before. (Some had
already been fixed based on comments on subsequent patch sets.)
File docs/topics/impala_kudu.xml:

PS4, Line 697: 
             :         </conbody>
             :       </concept>
             :     </concept>
> Remove this paragraph. You can use Kudu's white paper (section 3.2) for a b
I'll hide the paragraph and link to the white paper. (The white paper uses the old DISTRIBUTE
BY syntax so it's not perfect as background reading for users of Impala 2.8.)

PS4, Line 706: 
             :       <title>Partitioning for Kudu Tables</title>
             :       <conbody>
             :         <p>
             :           Kudu tables use special mechanisms to distribute data among the underlying
             :           tablet servers. Although we refer to such tables as partitioned tables,
they are
             :           distinguished from traditional Impala partitioned tables by use of
different clauses
             :           on the <codeph>CREATE TABLE</codeph> statement. Kudu
tables use
             :           <code
> I don't understand the point of this paragraph before even presenting the f
Since now I'm referring people to the Kudu white paper which mentions the old syntax, let's
leave this here for the moment.

PS4, Line 727: 
> remove

PS4, Line 731: rent ways to divide the data for each
             :           column, or even for different value ranges within a column. This
flexibility le
> You need to mention the drawback of using hash partitioning as well. i.e. q

PS4, Line 743:             which used an experimental fork of the Impala code. For example,
             :             <codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PA
> That claim in not universally true. I would just remove it or make it case 

PS4, Line 751: 
> this is cluster-dependent, and based on a Kudu configuration. Would not doc

PS4, Line 765: ibuted, instead of
             :             clumping together all in the same bucket. Spreading new rows across
the buckets this
             :             way lets insertion 
> That's not necessarily the primary reason for using range partitioning. You

PS4, Line 767: in parallel across multiple tablet servers.
             :             Separating the hashed values can impose additional overhead on
queries, where
> Can we add a formal syntax here?
For the moment I'm going to link over to the CREATE TABLE page. Later I'll see if I can extract
just the pieces for the relevant clauses and reuse them here.

PS4, Line 774: y 20,000 rows per partition.
> You need to talk when VALUE and VALUES is used (single vs multi-column rang
Isn't VALUE / VALUES independent of single or multiple columns in the range? I thought it
was dependent on whether there was only 1 comparison operator or 2:

Single column case:
VALUE <= constant
constant <= VALUES < constant

Multiple column case:
VALUE <= (constant,constant)
(constant,constant) <= VALUES < (constant,constant)

PS4, Line 788:               The largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
             :               clause varies depending on the number of tablet servers in the
cluster, while the smallest is 2.
> might be worth a note in the text above that this is multiplicative with an

PS4, Line 874:           </p>
             :           <p>
             :             Ra
> I don't believe this is true (or at least it's not our intention that it is

PS4, Line 884:   partition value = 'C', partition value = 'D', partition value = 'F')
             : ]]>
             : </codeblock>
> fill this out?
For this high-level overview stuff, I will use simple toy tables just to illustrate the syntax
without the trappings of a real enterprise-class table.

PS4, Line 908: alter table year_ranges add range partition 1890 <= values < 1893;
             : ]]>
             : </codeblock>
> I don't think this will show per-tablet sizes currently

PS4, Line 977: 
             : <!-- To do: fill in example. -->
             : <codeblock><![CDATA[
> not true anymore. now it's my_db::my_table

PS4, Line 1007: 
              :         <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
> some missing words in this sentence

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <>
Gerrit-Reviewer: Ambreen Kazi <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Jean-Daniel Cryans <>
Gerrit-Reviewer: John Russell <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

View raw message