impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Wed, 12 Oct 2016 05:19:32 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Patch Set 4:

File common/thrift/CatalogObjects.thrift:

Line 53: enum THdfsFileFormat {
> rename
This change would touch many places. Would you mind postponing it for a follow up patch?
File fe/src/main/cup/sql-parser.cup:

Line 976:   tbl_def_without_col_defs:tbl_def
> a 'create table' without col defs?
This was added for the EXTERNAL Kudu table use case for which no column definitions are specified
since we load the schema from the Kudu table. Added a comment.

Line 980:     RESULT = new CreateTableStmt(tbl_def); 
> trailing

Line 1033: // class doesn't inherit from CreateTableStmt.
> should it?
To my opinion yes it should. I actually went down the path of refactoring all the CREATE TABLE*
statements but it ended up being too complex to add on top of this big patch. Simplifying
the CREATE TABLE statements will also allow us to remove some of the weird table option handling
we do in I will leave a TODO for now.

Line 1065: primary_keys_val ::=
> opt_primary_keys?

Line 1089: tbl_data_layout ::=
> opt_...?

Line 1139:   {: 
> fix spaces and tabs

Line 1370:   KW_PRIMARY key_ident
> what's wrong with KW_KEY?
I don't think we can do that. Don't we use "key" for nested types (map)?
File fe/src/main/java/org/apache/impala/analysis/

Line 30:   static void throwIfNotNullOrNotEmpty(Collection<?> c, String message)
> this is actually 'not null *and* not empty'. you can also phrase that as 'n
Good point. Done
File fe/src/main/java/org/apache/impala/analysis/

Line 158:     if (fileFormat_ == THdfsFileFormat.KUDU) {
> check at top
File fe/src/main/java/org/apache/impala/analysis/

Line 236:         String.format("PRIMARY KEY must be used instead of the table property '%s'.",
> not good: if you do that on an external table, you get this error message i
I like that idea but it's a bit more complicated for Kudu because different properties are
valid depending on whether it's an external or managed table. Moved the check to the function
below. Let me know if that works ok.

Line 310:       distributeParam.setPKColumnDefMap(pkColumnDefsByName);
> setPkColumn...

Line 315:   private boolean hasPrimaryKeysSpecified() {
> hasPrimaryKeySpecified (there's only one, which can be a composite key)
File fe/src/main/java/org/apache/impala/analysis/

Line 121:           org.apache.impala.catalog.Type colType = colDef.getType();
> does simply Type conflict with something?
Yeah, there is a conflict with the enum Type in this class.

Line 129:           if (colType.isStringType() && !exprType.isStringType()
> this is basically looking for 'assignment compatible', and i'm sure we alre

Line 150:       builder.append(numBuckets_).append(" BUCKETS");
> sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a 
I added checks for numBuckets_. Split rows will go away in a follow up patch with the new
range partitioning syntax. I left a TODO to add a validate function then.

Line 200:             literal.setString_literal(expr.getStringValue());
> checkstate that you're getting something valid

Line 211:   void setPKColumnDefMap(Map<String, ColumnDef> pkColumnDefByName) {
> setPkC...
File fe/src/main/java/org/apache/impala/analysis/

Line 74:   static class TableDefOptions {
> 'Options' is enough

Line 160:     fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName());
> stick with fq abbreviation?

Line 189:     for (ColumnDef colDef: getPartitionColumnDefs()) {
> this is a bit hard to follow. partition cols aren't defined separately, the
These are the columns specified in a PARTITIONED BY clause (non-kudu) and they should be analyzed,
no? Sorry, I am not sure I follow your comment.
File fe/src/main/java/org/apache/impala/catalog/

PS4, Line 97: org.apache
> should this have changed?
Hm sed is not that smart :)
File fe/src/main/java/org/apache/impala/catalog/

PS4, Line 89: com.cloudera
> can you add a ref to IMPALA-4271 ?

Line 111:   // Distribution schemes of this Kudu table. Both rang and hash-based distributions
> range

Line 140:     return msTbl.getParameters().get(KuduTable.KEY_TABLE_NAME);
> i don't think this is worth a function call, it just makes the code harder 

PS4, Line 160: know
> known

PS4, Line 159:   /**
             :    * The number of nodes is not know ahead of time and will be updated during
             :    * in the scan node.
             :    */
             :   public int getNumNodes() { return -1; }
> I don't see this used
Yeah, removed.

PS4, Line 175: numClusteringCols_ = 0;
> not really related to this change, but it's kind of confusing to have numCl
I like Marcel's suggestion, I changed it to be the number of primary key columns.

PS4, Line 175: numClusteringCols_ = 0;
> those should be the primary key cols

PS4, Line 226:     List<FieldSchema> cols = msTable_.getSd().getCols();
             :     cols.clear();
> why do we get cols from getCols() and then clear() it?
cols is a reference to msTable cols. We clear them here and reload them from Kudu schema in
L232. Let me know if it's still not clear or if I should add a comment.

PS4, Line 232: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), null));
> why do we do this? cols isn't used later
See my comment above. I can add a comment if it's still not clear.
File fe/src/main/java/org/apache/impala/catalog/

PS4, Line 460: msTbl.getTableType().equals(TableType.EXTERNAL_TABLE.toString());
> we shuold probably compare case insensitive to be safe
File fe/src/main/java/org/apache/impala/service/

PS4, Line 1147: occurrs
> occurs

PS4, Line 1482:       } catch (Exception e) {
              :         try {
              :           // Error creating the table in HMS, drop the managed table from
              :           if (!Table.isExternalTable(newTable)) {
              :             KuduCatalogOpExecutor.dropTable(newTable, false);
              :           }
              :         } catch (Exception logged) {
              :           String kuduTableName = newTable.getParameters().get(KuduTable.KEY_TABLE_NAME);
              :           LOG.error(String.format("Failed to drop Kudu table '%s'", kuduTableName),
              :               logged);
              :           throw new RuntimeException(String.format("Failed to create the table
'%s' in " +
              :               " the Metastore and the newly created Kudu table '%s' could
not be " +
              :               " dropped. The log contains more information.", newTable.getTableName(),
              :               kuduTableName), e);
              :         }
              :         if (e instanceof AlreadyExistsException && params.if_not_exists)
return false;
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, "createTable"), e);
> it looks like none of this really needs to be inside the synchronized block
File fe/src/main/java/org/apache/impala/service/

PS4, Line 232:     if (!req.is_delta) {
             :       catalog = new ImpaladCatalog(defaultKuduMasterAddrs_);
             :     }
> 1line
File fe/src/main/java/org/apache/impala/service/

PS4, Line 135:       if (!hasRangePartitioning) {
             :         tableOpts.setRangePartitionColumns(Collections.<String>emptyList());
             :       }
> I don't think this is necessary
Unfortunately it is. I spoke to Dan (from Kudu team) about it. If the user doesn't specify
a range partitioning, Kudu by default creates one with all the primary key columns. So, the
distribute params we get from Kudu (and use in the SHOW stmt) is different from the distribute
params that the user specified. I added a comment to clarify this. Let me know if this is

PS4, Line 175: erros
> errors

PS4, Line 192: cols.clear();
> can you indicate in the comment that this doesn't just populate msTbl's col

PS4, Line 206: new KuduClient
> I'm not crazy about this wrapper class thing. It's only used in this file.

PS4, Line 212: is accessible
> exists

PS4, Line 215: validateTblProperties
> how about validateKuduTblExists ?

PS4, Line 224: Error accessing table in Kudu " +
             :           "master '%s'
> This could also print the name. Also to avoid confusing with potential futu
File fe/src/main/java/org/apache/impala/util/

> as I've said I'd vote to remove this, it's only used by 1 class and adds ex
File fe/src/test/java/org/apache/impala/analysis/

Line 1353:         "functional.alltypestiny", "Columns cannot be specified with an external
" +
> odd error message. i would expect the 'as select' to be the offending part.
Yeah, you're right. Fixed it.

Line 1720:     AnalyzesOk("create table tab (x int, y int, primary key (X)) " +
> i thought kudu is case-sensitive
We lowercase the pk columns during the analysis. Isn't that ok?
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 6: as select * from functional.alltypestiny
> shouldn't this be part of an analyzer test?
It used to be that many of these cases were handled during the analysis. Both MJ and Alex
suggested we avoid performing checks that are already performed in Kudu (e.g. no boolean primary
key columns). Hence, many of these cases are essentially analysis tests that are caught at
runtime. Let me know if you prefer to move these back to the analysis. The only issue with
this would be keeping these checks consistent with Kudu.

Line 30:   distribute by hash (x) into 3 buckets stored as kudu
> same here, and for the other analysis error test cases in this file
See comment above.

Line 32: NonRecoverableException: Key column may not have type of BOOL, FLOAT, or DOUBLE
> why wouldn't this be an analysis exception?
See comment above.

Line 46: NonRecoverableException: Got out-of-order key column: name: "y" type: INT32 is_key:
true is_nullable: false cfile_block_size: 0
> inscrutable error message
This comes from Kudu. I agree it is not user friendly. I'll file the Kudu team to fix this.

Line 53: NonRecoverableException: must have at least two hash buckets
> error message should point out the offending clause
Error message comes from Kudu. That's the drawback of not doing these checks in the analysis.
We don't control the error messages :(

Line 60: NonRecoverableException: hash bucket schema components must not contain columns in
> same here
Same comment as above. I understand this is annoying. The goal is to have the Kudu team fix
these error msgs.
File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test:

Line 1: ====
> might be a good idea to point out at the top that this test contains test c
Good idea. Done

> analyzer test?
File testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test:

> analyzer test?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-HasComments: Yes

View raw message