impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Fri, 16 Sep 2016 16:27:23 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 1:

(17 comments)

another batch of comments, still a lot of files i haven't touched yet...

http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 47:     {"<init>", "(ZILjava/lang/String;IIZLjava/lang/String;)V",
            :       &catalog_ctor_},
1 line


http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove it?
            : DEFINE_bool(load_catalog_at_startup, false, "if true, load all catalog data
at startup");
agreed. doesn't look like it's used by JniFrontend. Let's remove it.


PS1, Line 63: IPs
IP addresses.


PS1, Line 63:  
no space


http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 52: THdfsFileFormat
Maybe not for this patch since it's already huge, but it'd be great to generalize this if
we can. I can think of two improvements:
1) Maybe we should model the storage layer, e.g. have a TStorageEngine, then make this TFileFormat
(perhaps). This is probably a big change.
2) Rename this to be TStorageFormat, which kind of addresses #1 but doesn't separate out storage
engines and file formats.


PS1, Line 61: THdfsCompression
similarly, this seems unnecessarily specific to Hdfs. Not necessarily something to change
now but maybe we can create a follow-up JIRA to clean this up.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 31: ColumnDefOptions
this class doesn't exist, please add the missing file


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

PS1, Line 84: BigDecimal
why BigDecimal? Ultimately this has to resolve to some int for kudu's API. (We can check if
it's 32 or 64bit).


PS1, Line 110: <= 1)
is 1 bucket actually not valid?


PS1, Line 138: colType.isStringType() && !exprType.isStringType()
             :               || colType.isIntegerType() && (!exprType.isIntegerType()
             :                   || exprType.getPrecision() > colType.getPrecision())
1. I don't see anything in the Kudu client that explicitly says you can't partition on any
particular types. This code will exclude boolean and floating pt types, which is maybe unnecessary.

2. Esp if we can address #1, is there a cleaner way to make sure the types are valid rather
than enumerating the kinds of types to consider? I'm not sure, but maybe one of the frontend
gurus can think of something.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java:

PS1, Line 39: HdfsFileFormat
Can you open a JIRA (and leave it in the comment) to refactor this later?


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 79: 
            :   // TODO we should have something like KuduConfig.getDefaultConfig()
do you know what this means / can it be removed now that we're adding default master addrs?


Line 92:   public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by";
comment


Line 153:    * Load the columns from the schema list
can you add a comment about error handling in this function?


PS1, Line 158:       LOG.error(String.format("Kudu tables must have at least one"
             :           + "key column (had %d), and no more key columns than there are table
columns "
             :           + "(had %d).", keyColumns.size(), schema.size()));
shouldn't this still fail? if not, can you add a comment why this continues?


PS1, Line 184:       LOG.error(String.format("Some key columns were not found in"
             :               + " the set of columns. List of column names: %s, List of key
column names:"
             :               + " %s", Iterables.toString(columnNames), Iterables.toString(keyColumns)));
why do we continue?


PS1, Line 199:     // Get the table metadata from Kudu
             :     if (reuseMetadata) {
I'm confused about this. It's not clear to me from the name 'reuseMetadata' why this means
we should populate the metadata from Kudu. If anything, it sounds like it would be the opposite.

The base class comment just says "If 'reuseMetadata' is true, reuse valid existing metadata.".


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message