impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Thu, 15 Sep 2016 00:06:52 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Patch Set 1:


Ok, here's a bit to get started. Just made my way through most of analysis. Still have a lot
to get through but figured I'll give you the feedback I have while I do a few other things.
Commit Message:

PS1, Line 21: When attempting to create an external
            :   table "foo" in database "bar", Impala will search for a Kudu table
            :   name "" and "bar" (Kudu doesn't have database name spaces
            :   yet.)
I think this sentence is duplicated by #10. Maybe remove this part. This item is about reading
the columns, right? The issue of table name resolution is separate IMO, and covered by #10,
though consider moving them next together :)

PS1, Line 25:  The Kudu table is now required to exist at the time of creation in
            :   Impala.
for external tables? or if this is more general can you explain a bit more?

PS1, Line 31: default value
can it still be overridden?

PS1, Line 39: Eventually Hive should have the needed
            :   functionality and the Kudu delegate (renamed in this patch to KuduCatalogOpExecutor)
            :   can be removed.
maybe, it's not clear if impala would want to keep the ability to do this on its own... Remove?

PS1, Line 54:  If the database is "default" then
            :   the Kudu table name will not include the database.
hm... I wonder if it'd be easier just to keep it standardized and always have the name, i.e.
using default in this case. What do you think? I'll see if I change my mind after reading
more code.

PS1, Line 56: 11) Several improvements in the grammar related to the family
            :   of CREATE TABLE statements.
I don't think this bullet adds much value to the commit message. Everything else is more user
visible, this is just some necessary code cleanup.

PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new
            :   CREATE TABLE syntax.
I think this can be removed too, it should be implied there are tests and it takes away from
the billion other "features" in this patch :)

PS1, Line 60: source of truth
Does this mean we always hit Kudu for metadata? In the Catalog? In the impalad's Catalog?

PS1, Line 60: insteads

PS1, Line 64: Not included in this commit:
            : - Additional column properties such as nullability and compression
            : encodings.
I don't think you need to include this, we have a separate JIRA for this.
File fe/src/main/cup/sql-parser.cup:

FYI I'm not looking at this very carefully because it was reviewed previously

Let me know if this has changed much since then.

PS1, Line 405:   view_column_def_list, view_column_defs;
separate nonterminals if they dont fit?

PS1, Line 458: key_ident;
does this not need Boolean like those below?
File fe/src/main/java/com/cloudera/impala/analysis/

I'm not in love with the TableDef and TableDefOptions stuff because it feels like that's parser
implementation concerns bleeding into analysis code.

Do you think we need both? Can we fold the latter into TableDef?

If we have to keep both, it'd be helpful to at least group the functions in this file based
on their source (e.g. TableDef, TDOptions, or on this itself) with comments around the groups?
Maybe within those groups they could also be sorted by visibility? There's a lot of stuff
going on so I think it'd be helpful to keep it tidy.

PS1, Line 240: getColumnDefs().add(KuduCatalogOpExecutor.REQUIRED_THRIFT_COLUMN_DEF);
is this needed? I thought there was going to be code that fetches the columns from Kudu and
stores them.

PS1, Line 242:       if (!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) {
             :         getTblProperties().put(KuduTable.KEY_TABLE_NAME,
             :             KuduUtil.getDefaultCreateKuduTableName(getDb(), getTbl()));
             :       }
hm... can we simplify this and only allow internal tables to be created with the name db_name.table_name?
Obviously there could be conflicts and it should fail then. Then it'd be 1 less case to consider.

Also, I wonder if we should prefix the underlying kudu table names with something, e.g. IMPALA_db.table?

PS1, Line 275:         // Kudu's data distribution rules are enforced here. A good reference
             :         // As
a summary:
             :         // * Only primary key column can be used in distribution definitions
             :         // * Distributions may be either hash or ranged (value intervals)
             :         // * Multiple distributions can be used/mixed
             :         //   * A column can only be used in one hash distribution
             :         //   * A range distribution can only be used once and must be the last
             :         //     (both enforced by the grammar).
I don't think we should get in the business of validating the Kudu rules since they should
reject it. Is there a reason we have to do this ourselves as well?

PS1, Line 323: throwIfPrimaryKeysWereSpecified
I think the callers can throw, simpler to have a fn to check: bool hasPrimaryKeysSpecified().
Or maybe this isn't necessary? see my commetn below.

PS1, Line 327: isPrimaryKey
when will we have PKs from the col defs but not in the tableDef_.getPrimaryKeyColumnNames()?

Better yet, it looks like the 2 callers of this fn have slightly different use cases so maybe
it can be scrapped and handled more specifically. Feel free to disagree...

PS1, Line 327:         if (colDef.isPrimaryKey()) {
             :           hasKey = true;
             :         }

PS1, Line 335: throwIfNotPrimaryKeyType
I only see this called once, please inline it
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 32: Represents the end of a CREATE TABLE statement. TableDef represents the beginning
            :  * the statement.
I find this odd. The 'beginning/end of the statement' feels like a parser implementation detail.

I'm not sure why this needs to be a separate class. TableDef seems OK, and it already has
some things that get set after it is constructed (e.g. partitions), so maybe these things
can be moved to TableDef and set via addl functions.
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 207: HdfsFileFormat.KUDU
blegh this looks like an oxymoron. I wonder if we can rename this class to something like
StorageFormat. I'll keep an eye out as I read more code.

PS1, Line 212: keyCols != null
why would this be null?

PS1, Line 217: kuduTableName != null
why would this be null?

PS1, Line 217:       if (kuduTableName != null
             :           && kuduTableName.equals(KuduUtil.getDefaultCreateKuduTableName(
             :               table.getDb().getName(), table.getName()))) {
             :         properties.remove(KuduTable.KEY_TABLE_NAME);
             :       }
might just be cleaner to leave this in and not have to have 2 cases to test
File infra/python/deps/requirements.txt:

PS1, Line 67: # kudu-python==0.2.0
When this changes I think there may be some steps involved in "upgrading" the environment-
have you figured out what that is? We'll need to tell people what command(s) to run after
fetching this change.

PS1, Line 70:   # These should eventually be removed
            :   unittest2 == 1.1.0
            :     linecache2 == 1.0.0
            :     traceback2 == 1.4.0

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-HasComments: Yes

View raw message