impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Date Mon, 03 Apr 2017 20:12:15 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
......................................................................


Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table.
Otherwise? Will the destination table inherit the sort by columns of the source table, if
any?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: // Analyze 'sort.by.columns' property.
Comment not applicable. Remove?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
Similar to the comment on hasShuffleHint, explain the allowed values for hasClusteredHint
and hasNotClusteredHint.


PS5, Line 763:  
"if any"


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return ImmutableList.<Integer>of();
This is an immutable list whereas the next function returns a mutable one. Make it consistent?


PS5, Line 83: If there are any errors, 'error' contains an
            :    * error string. Otherwise, 'error' is left untouched.
remove


PS5, Line 91: colName
nit: rename to sortByColName? since you also have tableCols it may be good to make it explicit
which ones you refer to in the for loop.


Line 92: 
nit: remove empty line


PS5, Line 102: and store
             :       // the corresponding result expr in sortByExprs_.
Not sure I follow this part based on the code below.


Line 120: 
nit: remove empty line


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to
the
             :    * table.
Can't we throw an exception instead?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
I assume this works for HBase tables...


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
How about 'ID' (check for case)?


PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like functional.alltypes");
Also, add the negative case where sort by cols don't exist in the source table.


PS5, Line 1913: // Kudu table
              :     AnalyzesOk("create table tab (i int, x int primary key) partition by hash(x)
" +
              :         "partitions 8 sort by(i) stored as kudu");
              :     // Column in sortby hint must not be a Kudu primary key column.
              :     AnalysisError("create table tab (i int, x int primary key) partition by
hash(x) " +
              :         "partitions 8 sort by(x) stored as kudu", "SORT BY column list must
not " +
              :         "contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, primary key (x))
partition " +
              :         "by hash(x) partitions 8 sort by(x) stored as kudu", "SORT BY column
list must " +
              :         "not contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, primary key(x, y))
partition " +
              :         "by hash(y) partitions 8 sort by (i,x) stored as kudu", "SORT BY column
list " +
              :         "must not contain primary key column: 'x'");
              :   }
For Kudu specific tests they need to be in a function that first calls TestUtils.assumeKuduIsSupported()
(see L1930). Do you have tests for HBase?


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
Add one with a partitioned table? Also, a CTAS with a more complex select (joins, maybe aggs)


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
Out of curiosity? Can we change the default behavior? If no, are we ok with this restriction?


PS5, Line 39: ---- DISTRIBUTEDPLAN
            : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)]
            : |  partitions=24
            : |
            : 01:SORT
            : |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST,
bool_col ASC NULLS LAST
            : |
            : 00:SCAN HDFS [functional.alltypes]
            :    partitions=24/24 files=24 size=478.45KB
Do we need all these DISTRIBUTED_PLANs?


PS5, Line 193: ====
Can you add one or two more complex select statements feeding the insert? i.e. something with
a join that generates more complex plans


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:                  
where are these empty spaces coming from?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message