impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Date Fri, 20 Jan 2017 21:36:11 GMT
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/5649/6/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS6, Line 3741:         Run <codeph>REFRESH <varname>table_name</varname></codeph>
or
              :         <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
              :         for a Kudu table only after making a change to the Kudu table schema,
              :         such as adding or dropping a column, by a mechanism other than
              :         Impala.
I think this is wrong- they're not needed. we always reload Kudu metadata.


http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_grant.xml
File docs/topics/impala_grant.xml:

Line 147:       authorization, currently the Sentry support is considered preliminary.
and subject to change


http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS6, Line 266: expression
constant expression


PS6, Line 490: within the database world
colloquial phrasing, how about among rel. db mgmt systems


PS6, Line 561: 
             : UNKNOWN, AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, GROUP_VARINT, RLE,
DICT_ENCODING, BIT_SHUFFLE
             : 
             : No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
?


PS6, Line 677: behind the scenes
internally


Line 740:             <codeph>PARTITIONS <varname>n</varname></codeph>and
the range partitioning syntax
missing space


PS6, Line 755: , default
there is no default


PS6, Line 760: all
multiple

we can't promise it's all (there may also be skew in how partitions get mapped to tservers)


PS6, Line 778: he largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
             :               clause is 60
I don't think this is a limitation


PS6, Line 856: For range-partitioned Kudu tables, the range clauses must cover all the possible
data
             :             values for the applicable columns.
I see what this is saying but I think this sentence will be  confusing. It makes it sound
like you can't have gaps in the space.


PS6, Line 903: When a range is removed, no data can exist in the table within that range.
If some
             :             rows do have column values within the removed range, the operation
fails.
this makes it sound like you can't drop a range unless it's empty which is not true.


PS6, Line 993:         <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
             :         <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
one of these says:
" Run REFRESH table_name
or INVALIDATE METADATA table_name for a Kudu table only after making a change to the Kudu
table
schema, such as adding or dropping a column, by a mechanism other than Impala."

But Impala will always load the latest metadata from Kudu, so REFRESH / INV MD are not required.


PS6, Line 1099: :
              :       </p>
              : 
              : <codeblock><![CDATA[
              : 
              : ]]>
looks like something is missing


PS6, Line 1207: :
              :       </p>
              : 
              : <codeblock><![CDATA[
              : 
              : ]]>
empty code block?


PS6, Line 1250: Sentry authorization.
list the limitations?


PS6, Line 1308:  This section describes how Kudu stores and
              :           retrieves columnar data, to help you understand performance and
storage considerations
              :           of Kudu tables as compared with Parquet tables.
I don't see any other content


PS6, Line 1323: 
              :           The Apache Kudu architecture, topology, and data storage techniques
result in
              :           different patterns of memory usage for Impala statements than with
HDFS-backed tables.
not sure if this section is useful as-is


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <ambreen.kazi@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcryans@apache.org>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <todd@apache.org>
Gerrit-HasComments: Yes

Mime
View raw message