impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Mon, 10 Oct 2016 20:22:36 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 350: // Parameters needed for hash distribution
             : struct TDistributeByHashParam {
             :   1: required list<string> columns
             :   2: required i32 num_buckets
             : }
             : 
             : struct TRangeLiteral {
             :   1: optional i64 int_literal
             :   2: optional string string_literal
             : }
             : 
             : struct TRangeLiteralList {
             :   1: required list<TRangeLiteral> values
             : }
             : 
             : // A range distribution is identified by a list of columns and a series of
split rows.
             : struct TDistributeByRangeParam {
             :   1: required list<string> columns
             :   2: optional list<TRangeLiteralList> split_rows;
             : }
             : 
             : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified
by
             : // the type parameter.
             : struct TDistributeParam {
             :   // Set if type is set to HASH
             :   1: optional TDistributeByHashParam by_hash_param;
             : 
             :   // Set if type is set to RANGE
             :   2: optional TDistributeByRangeParam by_range_param;
             : }
> We store the distribute by params in the KuduTable class which is then seri
I don't see why this needs to be in the TKuduTable though. I think we just need this in the
TCreateTableParams. I wouldn't mind except it's more work to reason about when it's set/not
set in a TKuduTable.


PS3, Line 390: 
             :   // Distribution schemes
             :   4: required list<TDistributeParam> distribute_by
> See my comment above. We do load and serialize the distribute by params in 
I don't see why this needs to be in TKuduTable, only TCreateTableParams.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

PS3, Line 156:       if (splitRows_ == null) {
             :         builder.append("...");
> This is possible because an object of DistributeParam class is created in t
In case #2 why do we even bother creating a DistributeParams? 

The partitioning/distribution info only really makes sense at the time of table creation I
think, right? This goes back to our conversation (in person) about how it'd probably make
sense to separate the partitioning part of the command from CREATE TABLE, and do partitioning
more like how we would for non-Kudu tables, i.e. separate commands. Obviously that hasn't
been worked through and realistically we probably won't get to that in time.

If we don't get to something like that, I think the only reason we'd want #2 (in terms of
loading DistributeParams) is SHOW CREATE TABLE, so this class only needs to support toSql()
in case #2. Is that right? Can you add a comment in the class header or the relevant constructor
where we might get case #2 so it's easier to reason about?


A TODO / JIRA ref about resolving the '...' would be good. It might be good to add a test
case with this output if there isn't one already to make sure we fix it properly and/or it
doesn't change in some other unexpected way.


PS3, Line 191:       if (splitRows_ == null) {
             :         result.setBy_range_param(rangeParam);
             :         return result;
             :       }
> See explanation above.
I guess it's confusing to me because TDistributeParam doesn't seem to be necessary in case
#2.

In case #1 it gets added to a TCreateTableStmt object. In case #2 it gets added to a TKuduTable
object, but it seems unnecessary given that no other backend or even the catalog care about
the distribute params - right? If that's true, I think we could avoid storing this in KuduTable,
then avoid calling this in KuduTable.getTKuduTable(), and then throwing here if splitRows_
is null since I don't think this should be called in this case.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS3, Line 126: return fullyQualifiedTableName_ != null ? fullyQualifiedTableName_ : tableName_;
> We do call this function before tableName_ has been analyzed. One example i
ah ok, can you add a comment?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

PS3, Line 243: List<String> primaryKeysSql, String kuduDistributeByParams,
can you update the comment, especially around which of these should/can be null vs. empty.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message