impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Thu, 06 Oct 2016 01:36:32 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Patch Set 3:


next batch... I'm probably about 1/2 way through.
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 275: 
> Actually, I don't find it such a terrible idea. The sooner we figure out in
It's just more code to support and test...
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 138: 
> There is a restriction on the types of primary keys (no bool, float or doub
Thanks, makes sense
File fe/src/main/java/org/apache/impala/analysis/

Line 87:   InsertStmt getInsertStmt() { return insertStmt_; }
why change visibility?
File fe/src/main/java/org/apache/impala/analysis/

PS3, Line 344:   
nit extra space

PS3, Line 344: getTblPrimaryKeyColumnNames(),  null,
Why would there be PK col names?

Can you add a neg test case to

    AnalysisError("create table newkudutbl like kudu '/test-warehouse/schemas/alltypestiny.parquet'")
File fe/src/main/java/org/apache/impala/analysis/

PS3, Line 167:     if (getFileFormat() != THdfsFileFormat.KUDU && KuduTable.KUDU_STORAGE_HANDLER.equals(
             :         getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) {
             :       throw new AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE);
             :     }
             :     // Avro tables can have empty column defs because they can infer them from
the Avro
             :     // schema. Likewise for external Kudu tables, the schema can be read from
             :     if (getColumnDefs().isEmpty() && getFileFormat() != THdfsFileFormat.AVRO
             :         && getFileFormat() != THdfsFileFormat.KUDU) {
             :       throw new AnalysisException("Table requires at least 1 column");
             :     }
             :     if (getFileFormat() == THdfsFileFormat.KUDU) {
             :       analyzeKuduFormat(analyzer);
             :     } else {
             :       AnalysisUtils.throwIfNotNullOrNotEmpty(getDistributeParams(),
             :           "Only Kudu tables can use DISTRIBUTE BY clause.");
             :       throwIfPrimaryKeysWereSpecified("Only Kudu tables can specify a PRIMARY
             :     }
             :     if (getFileFormat() == THdfsFileFormat.AVRO) {
             :       setColumnDefs(analyzeAvroSchema(analyzer));
             :       if (getColumnDefs().isEmpty()) {
             :         throw new AnalysisException(
             :             "An Avro table requires column definitions or an Avro schema.");
             :       }
             :       AvroSchemaUtils.setFromSerdeComment(getColumnDefs());
             :     }
             :   }
what if we were to move everything Kudu related here into analyzeKuduFormat() which first
checks if it is Kudu. If it is not, it can make sure it _doesn't_ have the distribute or PK
params, then perform the storage handler check, then return. If it is a Kudu table it'd mostly
be the same as it is. Basically we could just always call the fn and it handles Kudu or no-Kudu
checks appropriately. Then we'd get rid of some of the Kudu mess here.

PS3, Line 279:         throw new AnalysisException(String.format("Column '%s' (type: '%s')
cannot " +
             :             "be used as a primary key. Keys must be integer or string types.",
             :             pkColDef.getColName(), pkColDef.getType().toSql()));
I don't see a test case for this error

PS3, Line 277:       ColumnDef pkColDef = getPrimaryKeyColumnDefs().get(i);
             :       if (!KuduUtil.isSupportedKeyType(pkColDef.getType())) {
             :         throw new AnalysisException(String.format("Column '%s' (type: '%s')
cannot " +
             :             "be used as a primary key. Keys must be integer or string types.",
             :             pkColDef.getColName(), pkColDef.getType().toSql()));
             :       }
             :       if (!pkColDef.equals(getColumnDefs().get(i))) {
             :         throw new AnalysisException(String.format("Primary key columns in Kudu
" +
             :             "tables must be declared first and in-order. Key column '%s' is
" +
             :             "out-of-order.", pkColDef.getColName()));
             :       }
If we weren't to do these checks (i.e. that they're supported types and that they come first
in the col defs), do you know how we would handle the Kudu failures?

Related to my prev comment in #1 about handling distribution param checking is just that if
they start relaxing these requirements then we have to change it and users have to upgrade
Impala as well as Kudu. If we can handle Kudu errors on create table nicely, it would help
us cut down on the amount of code we have.

PS3, Line 315:    * - Only primary key column can be used in distribution definitions
             :    * - Distributions may be either hash or ranged (value intervals)
             :    * - Multiple distributions can be used/mixed
             :    * - A column can only be used in one hash distribution
             :    * - A range distribution can only be used once and must be the last definition
             :    *   (both enforced by the grammar).
What Kudu supports for partitioning is a bit tricky when we get into nested schemes, so I'm
a bit concerned that (a) we're matching their supported functionality perfectly, (b) testing
our validation properly, and (c) that we'll necessarily catch any changes that they end up
making down the road. It would be nice if we can get a nice error from them and avoid duplication
of efforts given the complexity. Let's chat about this in person?

PS3, Line 358: throwIfPrimaryKeysWereSpecified
I do prefer to change this to return a bool and have callers throw: bool hasPrimaryKeysSpecified()
File fe/src/main/java/org/apache/impala/analysis/

PS3, Line 156:       if (splitRows_ == null) {
             :         builder.append("...");
can this happen? wouldn't l115 throw?

PS3, Line 191:       if (splitRows_ == null) {
             :         result.setBy_range_param(rangeParam);
             :         return result;
             :       }
this is possible? what does it mean?
File fe/src/main/java/org/apache/impala/catalog/

PS3, Line 259: primarily

PS3, Line 258: Creates a tmp Kudu table. The table is not added to the Kudu storage engine
or the
             :    * HMS.
I wasn't sure what this meant. How about:
Creates a temporary KuduTable object populated with the specified properties but has an invalid
TableId and is not added to...

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-HasComments: Yes

View raw message