Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 52C78200BA1 for ; Mon, 17 Oct 2016 23:29:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 51425160AEC; Mon, 17 Oct 2016 21:29:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 71FBA160AE2 for ; Mon, 17 Oct 2016 23:29:55 +0200 (CEST) Received: (qmail 2951 invoked by uid 500); 17 Oct 2016 21:29:53 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 2940 invoked by uid 99); 17 Oct 2016 21:29:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2016 21:29:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C575F1A028B for ; Mon, 17 Oct 2016 21:29:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id n3211AXHY05V for ; Mon, 17 Oct 2016 21:29:50 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AE07C5F368 for ; Mon, 17 Oct 2016 21:29:50 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u9HLToCN001350; Mon, 17 Oct 2016 21:29:50 GMT Message-Id: <201610172129.u9HLToCN001350@ip-10-146-233-104.ec2.internal> Date: Mon, 17 Oct 2016 21:29:50 +0000 From: "Alex Behm (Code Review)" To: Dimitris Tsirogiannis , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Marcel Kornacker , Michael Brown Reply-To: alex.behm@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3719=3A_Simplify_CREATE_TABLE_statements_with_Kudu_tables=0A?= X-Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 X-Gerrit-ChangeURL: X-Gerrit-Commit: a048011effb55824bea10d61237575ca6c234c18 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Mon, 17 Oct 2016 21:29:56 -0000 Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ...................................................................... Patch Set 7: (29 comments) http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1350: // CTAS in an external Kudu table What's the rationale for not allowing this? Line 1716: AnalyzesOk("create table tab (x int, y string, primary key(x, y)) " + Can you add a brief comment explaining the table layout for this one? Line 1717: "distribute by hash(x) into 3 buckets, range(y) split rows " + can we distribute by range, hash? Line 1723: AnalyzesOk("create table tab (a int, b int, c int, d int, primary key (a, b, c))" + What happens if I specify PARTITION BY for a Kudu table? Line 1739: AnalysisError(String.format("create table tab (x int) distribute by hash (x) " + What about specifying the other params in tblproperties like the distribute by and split rows. Line 1748: AnalysisError("create table tab (x int primary key, primary key(x)) stored " + Add negative test where two ColumnDefs are declared as PK. Line 1762: AnalysisError("create table tab (a int, b int, c int, d int, primary key(a, b, c)) " + Do we have tests somewhere for checking which types are supported with Kudu? We should make sure that: * you can create a table with all supported types (and same for the specific clauses like primary key, distributed by, etc.) * you cannot create tables with unsupported types like TIMESTAMP/DECIMAL and nested types (should fail gracefully with "not supported") * or alternatively, if we want to defer the type checks to Kudu (and not bake it into Impala analysis), then we should document that somewhere Line 1772: // No float split keys Even if the column type is float? If so, might want to test that. Line 1810: // DISTRIBUTE BY is required for managed tables. Primary key is also required, do we have a test? Line 1812: "Table partitioning must be specified for managed Kudu tables."); Let's rephrase this as "Table distribution must be specified ..." PS6, Line 1828: AnalysisError("create external table t tblproperties (" + Not that it makes any sense, but hat happens with: create external table t (primary key()) ... Line 1863: public void TestCreateAvroTest() { Add a test with some of the Kudu-specific clauses and STORED AS AVRO Line 2822: public void TestShowFiles() throws AnalysisException { DO we have a TODO/JIRA somewhere to go through all DDL/SHOW commands and make them behave properly for Kudu? http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 1632 Why remove this? It will break my setup :) Line 2235: ParsesOk("CREATE TABLE foo (i INT PRIMARY KEY, PRIMARY KEY(i)) STORED AS KUDU"); Add case like this: CREATE TABLE foo (i INT PRIMARY KEY, j PRIMARY KEY) STORED AS KUDU Line 2541: ParsesOk("CREATE TABLE Foo TBLPROPERTIES ('a'='b', 'c'='d') AS SELECT * from bar"); nit: let's be consistent with how we chop up string literals across lines (space at end of previous line xor beginning of next line) http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java File fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java: Line 58: "org.apache.impala.hive.serde.ParquetInputFormat", revert http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: Line 203: if row_format and file_format == 'text': I think row_format is also valid for sequence files (maybe we're not using it though) http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: Line 823: distribute by range(id) split rows ((1003), (1007)) stored as kudu; ah so much better http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test: Line 218: create table kudu_table_clone like functional_kudu.alltypes Why can't we check this in analysis? http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 3: # This test file contains several cases for what basically amount to analysis errors, Should mention that some of this behavior is intentional and why. Line 11: ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu Do we get the same message for DECIMAL and complex types? (No need to add another test, just asking whether you've checked). Line 14: create table t primary key (id) distribute by hash (id) into 3 buckets Add test for creating and querying a table that has Impala keywords as the table name and some column names. http://gerrit.cloudera.org:8080/#/c/4414/6/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: Line 69: def get_db_name(cls): Why not use the unique_database fixture? Sure, we don't run in parallel but unique_database seems saner (no need to explain all these framework intricacies) Line 94: name will be used. If 'prepend_db_name' is True, the table name will be prepended what does the db_name parameter do then? http://gerrit.cloudera.org:8080/#/c/4414/6/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 57: def test_insert_update_delete(self, vector, unique_database): Can we keep the .test file name and the python test function consistent? i.e. rename with to test_kudu_crud http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 192: 'show_create_sql' can be templates that can be used with str.format(). format() In our other SHOW CREATE TABLE tests we also try to run execute the output of SHOW CREATE TABLE Line 213: TBLPROPERTIES ('kudu.master_addresses'='{kudu_addr}')""".format( Why include these TBLPROPERTIES? Line 226: DISTRIBUTE BY HASH (c) INTO 3 BUCKETS, RANGE (c) SPLIT ROWS (...) is the SPLIT ROWS (...) legal syntax? -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes