impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Sat, 15 Oct 2016 00:00:05 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 5:

(41 comments)

Sending out the first wave.

http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG
Commit Message:

Line 45:    "KEY" is expected to be common enough that the ident style
Might also want to mention that "key" is used for nested map types, so it's not desirable
to make it a keyword.


Line 51:    on the HMS database and table name. If the database is "default",
Does this last regarding "default" make the code easier or more complicated? What's the motivation
for special casing it, i.e., what's the harm in including the "default" database name?


http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu master(s). The given
"
Can we make this more consistent with out existing options? For example:

-catalog_service_host
-state_store_host


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 176:   static List<String> toColumnNames(Collection<ColumnDef> columnDefs)
{
colDefs


Line 184:   static Map<String, ColumnDef> mapByColumnNames(Collection<ColumnDef>
colDefs) {
comment that colDefs is assumed have no duplicate names, or alternatively deal with that case


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 99:           "does not support (%s) file format. Supported formats are: (%s)",
the (%s) file format.


Line 100:           createStmt_.getFileFormat().toString().replace("_", ""),
what's with this weird "_" replacement?


Line 101:           "PARQUET, TEXTFILE, KUDU"));
use SUPPORTED_INSERT_FORMATS instead of hardcoding


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 80:   private void setColumnDefs(List<ColumnDef> columnDefs) {
nit: colDefs everywhere


Line 95:   Map<String, String> getTblProperties() { return tableDef_.getTblProperties();
}
is there any significance to not adding public/private to some of these methods?


Line 192:           "Only Kudu tables can use DISTRIBUTE BY clause.");
the DISTRIBUTE BY clause


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 43:  * Represents the table parameters in a CREATE TABLE statement.
Let's be more precise and list the clauses that this captures.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 97:   // Table name in Kudu storage engine. It may not neccessarily be the same as the
the Kudu storage engine


Line 101:   // 1. For managed tables, 'kuduTableName_' is prefixed with '__IMPALA__<db_name>'
to
Can we make the prefix 'impala::<db_name>' instead?

Better to keep the "impala" part lowercase for consistency.

It might be a minor detail, but keeping the name readable seems important. I'd imagine that
Kudu tables will be accessed from several tools.


Line 154:    * are loaded from HMS. The function also updates the table schema in HMS.
Add a comment about why we also update the HMS. My understanding is that we want to propagate
alterations made to the Kudu table to the HMS.


Line 195:       msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_);
This means every invalidate/refresh will do an extra alter to HMS. Are we concerned about
the extra load?


Line 202:    * Loads the schema from the Kudu table including column definitions and primary
key
mention that it loads the schema into this table as well as the HMS table


Line 216:       String colName = ToSqlUtils.getIdentSql(colSchema.getName());
getIdentSql() may add backticks, I don't think we want that here


Line 223:     numClusteringCols_ = primaryKeyColumnNames_.size();
Check that there is at least one PK column


Line 228:     PartitionSchema partitionSchema = kuduTable.getPartitionSchema();
Preconditions.checkState(!colsByPos_.isEmpty()); ?


Line 254:   public static KuduTable createTmpTable(Db db,
createCtasTarget()? Seems more explicit since there really is only one use.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1117:     if (db != null) dropTablesFromKudu(db);
Won't this drop all Kudu tables even if CASCADE is not specified?


Line 1151:     // the metadata must be fetched from the Hive Metastore.
Why do we need to load from HMS? See comments below.


Line 1167:         throw new ImpalaRuntimeException(
This seems like weird behavior. The user issued a cascading drop db, but gets an error because
one of the tables could not be fetched from the HMS. It seems fine to ignore the error because
the user wanted to drop everything anyway.


Line 1172:       if (!KuduTable.isKuduTable(msTable)) continue;
Why do we need this check? Won't Kudu tell us if the table does not exist? Seems fine to ignore
that error.


Line 1438:     StorageDescriptor sd = HiveStorageDescriptorFactory.createSd(
should we consolidate all the SD stuff into one createSd() call?


Line 1453:     if (params.getPartition_columns() != null) {
for my understanding: these are empty for Kudu tables right?


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 41: //import org.apache.hadoop.hive.metastore.api.Table;
?


Line 53:  * such as creating and droping tables from Kudu.
typo: dropping


Line 58:   public final static ColumnDef REQUIRED_THRIFT_COLUMN_DEF;
Isn't it enough to set an empty list or mark the thrift field as set?


Line 70:   static void createTable(org.apache.hadoop.hive.metastore.api.Table msTbl,
createManagedTable()? and then mention that msTbl must be a managed table


Line 85:       kudu.createTable(kuduTableName, schema, tableOpts);
May this throw if the table already exists? If so it might be cleaner to rely on that exception
for checking whether the table already exists. Right now there's a possible race because tableExists()
and createTable() are not atomic (i.e., kudu could still say the table already exists here)


Line 97:       throws ImpalaRuntimeException {
remove throws?


Line 121:       TCreateTableParams params, Schema schema) throws ImpalaRuntimeException {
remove throws?


Line 155:       int r = Integer.parseInt(replication);
catch NumberFormatException?


Line 170:     if (Table.isExternalTable(msTbl)) return;
Can we make this a Precondition and have the caller check it?


Line 178:       if (kudu.tableExists(tableName)) {
same comment about non-atomic exists() + drop() calls


Line 214:         String colName = ToSqlUtils.getIdentSql(colSchema.getName());
I don't think we want getIdentSql() here


Line 239:     } catch (Exception e) {
I think we should be more nuanced in our exception handling here, in particular, also include
the original exception as the cause. For example, if for some reason we cannot connect to
Kudu we will return a "table does not exist" without even including the original cause (presumably
an IOE).


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

Line 198:         metastoreTableName : "__IMPALA__" + metastoreDbName + "." + metastoreTableName;
make the prefix a static constant (and change it to "impalad::" please)


Line 240:       case BOOL: return Type.BOOLEAN;
no DECIMAL?


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message