Return-Path: X-Original-To: apmail-impala-dev-archive@minotaur.apache.org Delivered-To: apmail-impala-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E81131905A for ; Tue, 26 Apr 2016 03:14:47 +0000 (UTC) Received: (qmail 12042 invoked by uid 500); 26 Apr 2016 03:14:47 -0000 Delivered-To: apmail-impala-dev-archive@impala.apache.org Received: (qmail 12004 invoked by uid 500); 26 Apr 2016 03:14:47 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 11993 invoked by uid 99); 26 Apr 2016 03:14:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2016 03:14:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2ADE2C1C4A for ; Tue, 26 Apr 2016 03:14:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id s2kS8EcbJiHj for ; Tue, 26 Apr 2016 03:14:45 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 77A935F47F for ; Tue, 26 Apr 2016 03:14:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u3Q3Egqt021691; Tue, 26 Apr 2016 03:14:42 GMT Message-Id: <201604260314.u3Q3Egqt021691@ip-10-146-233-104.ec2.internal> Date: Tue, 26 Apr 2016 03:14:42 +0000 From: "Casey Ching (Code Review)" To: impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Matthew Jacobs Reply-To: casey@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?[Impala-CR](cdh5-trunk)_Simplify_creating_external_Kudu_tables_and_add_DROP_DATABASE_CASCADE=0A?= X-Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d X-Gerrit-ChangeURL: X-Gerrit-Commit: 009c9df3e01570a397a90a196635de6beb5fcbd2 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.10-rc0 Casey Ching has posted comments on this change. Change subject: Simplify creating external Kudu tables and add DROP DATABASE CASCADE ...................................................................... Patch Set 7: (27 comments) http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtil.java File fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtil.java: Line 1: package com.cloudera.impala.analysis; > legal header Done http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 151: getRowFormat > does this need to be public? No, there were also a few others that didn't need that either. Removed "pubic". Line 152: rowFormat_ != null ? rowFormat_ : : fileFormat_ == THdfsFileFormat.TEXT ? RowFormat.DEFAULT_ROW_FORMAT : null; > can you change the first ternary to if/else for readability, and in case it Done Line 224: fileFormat_ != THdfsFileFormat.KUDU && : KuduTable.KUDU_STORAGE_HANDLER.equals( : tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER)) > This allows the Kudu storage handler to be set if the fileFormat_ is KUDU. That's fine here. The check is more about making sure that other file formats don't use the Kudu storage handler, most importantly text. The check when fileFormat_ == KUDU happens later. Line 237: fileFormat_ != THdfsFileFormat.KUDU > Following up on my comment from an earlier revision. If we don't check that Same idea as above. I think of these as Kudu logic leakages. Ideally all this stuff would be in analyzeKudu() but that's not really possible. Instead the leakages are minimized. This check is more about non-Kudu tables. Line 298: throw new AnalysisException(String.format( : "ROW FORMAT cannot be specified for file format %s.", fileFormat_)); > can you check if ff != TEXT sooner to throw this, then same on the indentat Done Line 308: tblProperties_ > Re-analyze still seems broken... I chatted with Alex a bit. I'm going to try setting the default reset() behavior to throw an exception. Apparently only a few stmts need that ability. Putting the burden on all stmts to support reset() when its not really needed seems bad. I'll give that a try in separate patch. Line 331: columnDefs_ > In the case of reanalysis (reset() -> analysis()) we'll add duplicates of t Same as above about reset() not needed. Line 335: keyCols > I think we might want some additional analysis on the keys, e.g., are the k That's part 2, updates for managed tables. There shouldn't be any regression around that here though. http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java: Line 160: Preconditions.checkState( : cols.get(0).getName().equals(REQUIRED_THRIFT_COLUMN_DEF.getColName())); > I didn't think we prevented users from creating an external table with colu It's already there AnalysisUtil.throwIfNotNullOrNotEmpty(columnDefs_, "Columns cannot be specified with an external Kudu table."); in analyzeKudu() Line 164: kuduTable.getSchema().getColumns() > do we know if this comes back in a consistent order? Given we flatten the n Looks like they are in "index order" https://github.com/cloudera/kudu/blob/master/java/kudu-client/src/main/java/org/kududb/Schema.java#L143 Line 194: com.cloudera.impala.catalog. > I think you can remove the namespace as it's clear from l197 what's going o Done Line 204: csb.build(); > nit: just put this directly on l205 and save a line Done Line 213: > nit: space Done http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 1126: s > scenario Done Line 1131: If that's assumption isn't > This doesn't make sense. that's -> that? Done Line 1121: // The use of DdlDelegates combined with the fact that the catalog is disconnected : // from the source of truth, the Hive metastore, leads to some complications here. : // If 'db' is null, that should mean either the database doesn't really exist or the : // database was created outside of Impala and the user didn't run "invalidate : // metadata". If 'db' is not null, there still may be tables in the metastore that : // would require a DdlDelegate. Another scenarios is when the user creates tables : // that reqire a DdlDelegate in Impala, then drops the database in Hive. Ideally the : // user visible result of DROP DATABASE CASCADE would be the same in any case. The : // best solution is probably to move the DdlDelegate functionality into Hive. For : // now Impala will assume that any table not in its cache also doesn't require the : // use of a DdlDelegate. If that's assumption isn't correct, the database will still : // be dropped in the metastore but the underlying data would remain. > this is getting big, how about creating a JIRA to track this Hive/Impala in Filed https://issues.cloudera.org/browse/IMPALA-3424 Line 1132: // be dropped in the metastore but the underlying data would remain. > Can you add that users can be sure to avoid this by calling refresh before Done Line 1159: the user is expected to ALTER TABLE to either set the : // table to UNMANAGED or set the format to one that doesn't require a delegate. > we should track this somehow so we can eventually get it in documentation. Ya I was going to mention this to John. I just started a list. Line 1541: setDistributeParams(distributeBy) > This is only relevant to Kudu, so it'd be nice if we could have the KuduDdl This entire "ddl delegate" thing seems premature considering there is only one ddl delegate. I'd rather just remove it but I don't want to do that now. Line 1540: DdlDelegate ddlDelegate = createDdlDelegate(newTable) : .setDistributeParams(distributeBy); : ddlDelegate.createTable(); > Marcel had said in the sync-up about doing this only when you had a real de It's not clear he really understood how this was used. I'm going to wait on that. Line 1558: ddlDelegate.dropTable(); > This would be affected as well Same http://gerrit.cloudera.org:8080/#/c/2617/7/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test File testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test: Line 10: STORED AS KUDU > what about distribution parameters? That's not related to this change so I'll do it later. Mike already filed a jira for this https://issues.cloudera.org/browse/IMPALA-2634 http://gerrit.cloudera.org:8080/#/c/2617/7/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: Line 52: def get_db_name(cls): : if not cls.__DB_NAME: : cls.__DB_NAME = \ : choice(ascii_lowercase) + "".join(sample(ascii_lowercase + digits, 5)) : return cls.__DB_NAME > what about handling the race? This isn't intended to be thread safe. I can add a comment if you like. http://gerrit.cloudera.org:8080/#/c/2617/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 110: a > an external Done Line 145: cursor.execute(""" : CREATE EXTERNAL TABLE %s : STORED AS KUDU : TBLPROPERTIES('kudu.table_name' = '%s')""" % ( : impala_table_name, preferred_kudu_table.name)) > Hm, I don't think we should allow this behavior because there's no way to i I'm not sure I follow, hasn't the user given an explicit name here? Why would there be any confusion? Line 183: cursor > The same db is already the default db for this db, right? Can you add a com Added. I doubt that's the only place that relies on this behavior but I didn't check. It'll be a little strange if this is the only place that mentions this. It'll also be noisy to sprinkle these messages around. -- To view, visit http://gerrit.cloudera.org:8080/2617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Casey Ching Gerrit-Reviewer: Casey Ching Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes