impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Wed, 05 Oct 2016 21:53:48 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 3:

(11 comments)

Not even close to done but want to start giving you feedback in pieces so we can parallelize
a bit.

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types
             : enum TDistributeType {
             :   HASH,
             :   RANGE,
I don't see this used


PS3, Line 350: // Parameters needed for hash distribution
             : struct TDistributeByHashParam {
             :   1: required list<string> columns
             :   2: required i32 num_buckets
             : }
             : 
             : struct TRangeLiteral {
             :   1: optional i64 int_literal
             :   2: optional string string_literal
             : }
             : 
             : struct TRangeLiteralList {
             :   1: required list<TRangeLiteral> values
             : }
             : 
             : // A range distribution is identified by a list of columns and a series of
split rows.
             : struct TDistributeByRangeParam {
             :   1: required list<string> columns
             :   2: optional list<TRangeLiteralList> split_rows;
             : }
             : 
             : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified
by
             : // the type parameter.
             : struct TDistributeParam {
             :   // Set if type is set to HASH
             :   1: optional TDistributeByHashParam by_hash_param;
             : 
             :   // Set if type is set to RANGE
             :   2: optional TDistributeByRangeParam by_range_param;
             : }
I see that these are used in the serialized catalog objects, but given that none of this is
stored in the metastore nor do we require it to be stored in the catalog, I don't see why
we bother keeping all this in the thrift table object.

Can we just remove it and not serialize this info? Hopefully we can even split the distribute/partition
parameters out of CREATE TABLE eventually, anyway.


PS3, Line 390: 
             :   // Distribution schemes
             :   4: required list<TDistributeParam> distribute_by
same for this, can we get rid of it?


http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 398: CatalogObjects.TDistributeParam
wrt my comments in CatalogObjects, I guess we'd need them here, but I don't see why we bother
storing them in the serialized catalog table object.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java:

PS3, Line 21:  * Placeholder for the column definition options of a CREATE TABLE statement.
            :  * Contains the list of column definitions and, optionally, the list of column
names
            :  * specified using the PRIMARY KEY keyword.
this feels like a weird abstraction... do we really need this class?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java:

PS3, Line 24: class TableDataLayout {
            : 
            :   private final List<ColumnDef> partitionColDefs_;
            :   private final List<DistributeParam> distributeParams_;
Not that we're doing this in this review, but we need to think about how this might change
when we do the next steps for Kudu partitioning.


PS3, Line 49:   List<ColumnDef> getPartitionColumnDefs() { return partitionColDefs_;
}
            :   List<DistributeParam> getDistributeParams() { return distributeParams_;
}
any reason these aren't public?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS3, Line 1: // Copyright 2016 Cloudera Inc.
           : //
           : // Licensed under the Apache License, Version 2.0 (the "License");
           : // you may not use this file except in compliance with the License.
           : // You may obtain a copy of the License at
           : //
           : // http://www.apache.org/licenses/LICENSE-2.0
           : //
           : // Unless required by applicable law or agreed to in writing, software
           : // distributed under the License is distributed on an "AS IS" BASIS,
           : // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           : // See the License for the specific language governing permissions and
           : // limitations under the License.
this and maybe others need the new license header:

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements.  See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership.  The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License.  You may obtain a copy of the License at
//
//   http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied.  See the License for the
// specific language governing permissions and limitations
// under the License.


PS3, Line 126: return fullyQualifiedTableName_ != null ? fullyQualifiedTableName_ : tableName_;
Do we need to support calling this before analysis? It'd be nice if we just have 1 possible
result here, e.g. throw if fullyQualifiedTableName_ is null (not yet analyzed) and just return
fullyQualifiedTableName_


PS3, Line 137:   List<DistributeParam> getDistributeParams() {
             :     return dataLayout_.getDistributeParams();
             :   }
1line?


http://gerrit.cloudera.org:8080/#/c/4414/3/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

PS3, Line 83: The kudu-python
            : # version in download_requirements must be kept in sync with this version.
gotta update download_requirements as well


diff --git a/infra/python/deps/download_requirements b/infra/python/deps/download_requirements
index daa5025..d586104 100755
--- a/infra/python/deps/download_requirements
+++ b/infra/python/deps/download_requirements
@@ -29,5 +29,5 @@ PY26="$(./find_py26.py)"
 "$PY26" pip_download.py virtualenv 13.1.0
 # kudu-python is downloaded separately because pip install attempts to execute a
 # setup.py subcommand for kudu-python that can fail even if the download succeeds.
-"$PY26" pip_download.py kudu-python 0.1.1
+"$PY26" pip_download.py kudu-python 0.2.0


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message