impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Date Wed, 23 Aug 2017 03:55:13 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 1:

(8 comments)

Looks good, just minor comments

http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

Line 863:   queries to understand the data distribution and plan a partitioning strategy,
I'd leave out the "to understand the data distribution and plan a partitioning strategy" because
that already supposes a certain use case in the user's mind. I'd not make any assumptions
about what the user wants to do with TABLESAMPLE.


Line 865:   to only a percentage of data within the table. This technique reduces the overhead
Nice!


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:

Line 175:         clause immediately after a table reference, to specify that the query only
processes an
a certain percentage of the table data? an "arbitrary portion" sounds strange and it's not
really completely arbitrary


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:

Line 57:       The <codeph>TABLESAMPLE</codeph> clause comes immediately after
a table name.
table name or alias, e.g.

from mytable t tablesample ...


Line 69:       processing a particular set of data files, the proportion of sampled data from
the
suggest "selecting a random set of data files" instead of "processing a particular set of
data files"


Line 77:       sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph>
suggest "selects" instead of "considers"


Line 172:       by itself, because all phases of query execution use less data overall.
This is not necessarily true, depending on whether the small query optimization kicks in with
limit.


Line 257:       table metadata is not updated by a <codeph>REFRESH</codeph> 
whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Greg Rahn <grahn@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message