impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Date Mon, 17 Oct 2016 21:29:50 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 7:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1350:     // CTAS in an external Kudu table
What's the rationale for not allowing this?


Line 1716:     AnalyzesOk("create table tab (x int, y string, primary key(x, y)) " +
Can you add a brief comment explaining the table layout for this one?


Line 1717:         "distribute by hash(x) into 3 buckets, range(y) split rows " +
can we distribute by range, hash?


Line 1723:     AnalyzesOk("create table tab (a int, b int, c int, d int, primary key (a, b,
c))" +
What happens if I specify PARTITION BY for a Kudu table?


Line 1739:     AnalysisError(String.format("create table tab (x int) distribute by hash (x)
" +
What about specifying the other params in tblproperties like the distribute by and split rows.


Line 1748:     AnalysisError("create table tab (x int primary key, primary key(x)) stored
" +
Add negative test where two ColumnDefs are declared as PK.


Line 1762:     AnalysisError("create table tab (a int, b int, c int, d int, primary key(a,
b, c)) " +
Do we have tests somewhere for checking which types are supported with Kudu? We should make
sure that:
* you can create a table with all supported types (and same for the specific clauses like
primary key, distributed by, etc.)
* you cannot create tables with unsupported types like TIMESTAMP/DECIMAL and nested types
(should fail gracefully with "not supported")
* or alternatively, if we want to defer the type checks to Kudu (and not bake it into Impala
analysis), then we should document that somewhere


Line 1772:     // No float split keys
Even if the column type is float? If so, might want to test that.


Line 1810:     // DISTRIBUTE BY is required for managed tables.
Primary key is also required, do we have a test?


Line 1812:         "Table partitioning must be specified for managed Kudu tables.");
Let's rephrase this as "Table distribution must be specified ..."


PS6, Line 1828: AnalysisError("create external table t tblproperties (" +
Not that it makes any sense, but hat happens with:

create external table t (primary key()) ...


Line 1863:   public void TestCreateAvroTest() {
Add a test with some of the Kudu-specific clauses and STORED AS AVRO


Line 2822:   public void TestShowFiles() throws AnalysisException {
DO we have a TODO/JIRA somewhere to go through all DDL/SHOW commands and make them behave
properly for Kudu?


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 1632
Why remove this? It will break my setup :)


Line 2235:     ParsesOk("CREATE TABLE foo (i INT PRIMARY KEY, PRIMARY KEY(i)) STORED AS KUDU");
Add case like this:
CREATE TABLE foo (i INT PRIMARY KEY, j PRIMARY KEY) STORED AS KUDU


Line 2541:     ParsesOk("CREATE TABLE Foo TBLPROPERTIES ('a'='b', 'c'='d') AS SELECT * from
bar");
nit: let's be consistent with how we chop up string literals across lines (space at end of
previous line xor beginning of next line)


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
File fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java:

Line 58:         "org.apache.impala.hive.serde.ParquetInputFormat",
revert


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

Line 203:   if row_format and file_format == 'text':
I think row_format is also valid for sequence files (maybe we're not using it though)


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 823: distribute by range(id) split rows ((1003), (1007)) stored as kudu;
ah so much better


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
File testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test:

Line 218: create table kudu_table_clone like functional_kudu.alltypes
Why can't we check this in analysis?


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 3: # This test file contains several cases for what basically amount to analysis errors,
Should mention that some of this behavior is intentional and why.


Line 11: ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu
Do we get the same message for DECIMAL and complex types? (No need to add another test, just
asking whether you've checked).


Line 14: create table t primary key (id) distribute by hash (id) into 3 buckets
Add test for creating and querying a table that has Impala keywords as the table name and
some column names.


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

Line 69:   def get_db_name(cls):
Why not use the unique_database fixture? Sure, we don't run in parallel but unique_database
seems saner (no need to explain all these framework intricacies)


Line 94:        name will be used. If 'prepend_db_name' is True, the table name will be prepended
what does the db_name parameter do then?


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 57:   def test_insert_update_delete(self, vector, unique_database):
Can we keep the .test file name and the python test function consistent? i.e. rename with
to test_kudu_crud


http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 192:        'show_create_sql' can be templates that can be used with str.format(). format()
In our other SHOW CREATE TABLE tests we also try to run execute the output of SHOW CREATE
TABLE


Line 213:         TBLPROPERTIES ('kudu.master_addresses'='{kudu_addr}')""".format(
Why include these TBLPROPERTIES?


Line 226:         DISTRIBUTE BY HASH (c) INTO 3 BUCKETS, RANGE (c) SPLIT ROWS (...)
is the SPLIT ROWS (...) legal syntax?


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message