impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Date Tue, 04 Apr 2017 20:22:34 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
......................................................................


Patch Set 7:

(20 comments)

Thank you for your review! Please see PS7.

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table. If unspecified, the destination table
will
> Otherwise? Will the destination table inherit the sort by columns of the so
Yes, it'll inherit them. I expanded the comment. We have a test for this in QueryTest/create-table-like-table.test.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: super.analyze(analyzer);
> Comment not applicable. Remove?
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
> Similar to the comment on hasShuffleHint, explain the allowed values for ha
Done


PS5, Line 763:  
> "if any"
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return Lists.newArrayList();
> This is an immutable list whereas the next function returns a mutable one. 
Done


PS5, Line 83: 
            :    */
> remove
Done


PS5, Line 91: sortByC
> nit: rename to sortByColName? since you also have tableCols it may be good 
Done


Line 92:       // Make sure it's not a primary key column or partition column.
> nit: remove empty line
Done


PS5, Line 102: 
             :       for (int j = 0; j < tableCols.size(); ++j) {
> Not sure I follow this part based on the code below.
Done, updated the comment.


Line 120:     return colIdxs;
> nit: remove empty line
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to
the
             :    * table.
> Can't we throw an exception instead?
We could throw one if the number is wrong, but eventually the caller will have to provide
k columns and then set this to some value 0 <= n < k. Should we check that? The comment
was meant to indicate that the caller will be on themselves to use this.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
> I assume this works for HBase tables...
Insert hints are not allowed for HBase tables and the analysis will fail. From looking at
InsertStmt I think that this would return an empty list of exprs. However I think it could
be more explicit. Should I add an "instanceof" check for HdfsTable here?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
> How about 'ID' (check for case)?
Indeed it did not work, I fixed it.


PS5, Line 1642:     "CREATE TABLE LIKE is not supported for Kudu tables");
> Also, add the negative case where sort by cols don't exist in the source ta
Done


PS5, Line 1913: // Partitioned HDFS table
              :     AnalyzesOk("create table functional.new_table (i int) PARTITIONED BY (d
decimal)" +
              :         "SORT BY (i)");
              :     // Column in sortby hint must not be a Hdfs partition column.
              :     AnalysisError("create table functional.new_table (i int) PARTITIONED BY
(d decimal)" +
              :         "SORT BY (d)", "SORT BY column list must not contain partition column:
'd'");
              : 
              :     // Kudu table tests are in TestCreateManagedKuduTable().
              :   }
              : 
              :   @Test
              :   public void TestAlterKuduTable() {
              :     TestUtils.assumeKuduIsSupported();
              :    
> For Kudu specific tests they need to be in a function that first calls Test
I move the tests to TestCreateManagedKuduTable().

We don't support setting sort by columns for HBase tables (I added code to AlterTableSortByStmt
to enforce this). During analysis, we don't parse the table property, either.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
> Add one with a partitioned table? Also, a CTAS with a more complex select (
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
> Out of curiosity? Can we change the default behavior? If no, are we ok with
No, we cannot change it. I thought about this, but couldn't see why we'd want to have the
nulls at the other end of the sorted data. Can you think of a case where this would be beneficial?


PS5, Line 39: ---- DISTRIBUTEDPLAN
            : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)]
            : |  partitions=24
            : |
            : 01:SORT
            : |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST,
bool_col ASC NULLS LAST
            : |
            : 00:SCAN HDFS [functional.alltypes]
            :    partitions=24/24 files=24 size=478.45KB
> Do we need all these DISTRIBUTED_PLANs?
They make sure that the shuffle/noshuffle hint has been observed, indicated be the presence/absence
of an exchange node in the distributed plan. Should we remove them?


PS5, Line 193: ====
> Can you add one or two more complex select statements feeding the insert? i
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:                  
> where are these empty spaces coming from?
They are part of the output of "describe formatted". Without them the comparison fails. I
could imagine Hive has them and we add them to be compatible, but I'm not sure. Should we
investigate and/or open a JIRA for them?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message