impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Date Mon, 03 Apr 2017 20:12:15 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause

Patch Set 5:

File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table.
Otherwise? Will the destination table inherit the sort by columns of the source table, if
File fe/src/main/java/org/apache/impala/analysis/

PS5, Line 64: // Analyze '' property.
Comment not applicable. Remove?
File fe/src/main/java/org/apache/impala/analysis/

PS5, Line 127: hasNoClusteredHint_
Similar to the comment on hasShuffleHint, explain the allowed values for hasClusteredHint
and hasNotClusteredHint.

PS5, Line 763:  
"if any"
File fe/src/main/java/org/apache/impala/analysis/

PS5, Line 77: return ImmutableList.<Integer>of();
This is an immutable list whereas the next function returns a mutable one. Make it consistent?

PS5, Line 83: If there are any errors, 'error' contains an
            :    * error string. Otherwise, 'error' is left untouched.

PS5, Line 91: colName
nit: rename to sortByColName? since you also have tableCols it may be good to make it explicit
which ones you refer to in the for loop.

Line 92: 
nit: remove empty line

PS5, Line 102: and store
             :       // the corresponding result expr in sortByExprs_.
Not sure I follow this part based on the code below.

Line 120: 
nit: remove empty line
File fe/src/main/java/org/apache/impala/catalog/

PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to
             :    * table.
Can't we throw an exception instead?
File fe/src/main/java/org/apache/impala/planner/

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
I assume this works for HBase tables...
File fe/src/test/java/org/apache/impala/analysis/

PS5, Line 521: ''='id
How about 'ID' (check for case)?

PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like functional.alltypes");
Also, add the negative case where sort by cols don't exist in the source table.

PS5, Line 1913: // Kudu table
              :     AnalyzesOk("create table tab (i int, x int primary key) partition by hash(x)
" +
              :         "partitions 8 sort by(i) stored as kudu");
              :     // Column in sortby hint must not be a Kudu primary key column.
              :     AnalysisError("create table tab (i int, x int primary key) partition by
hash(x) " +
              :         "partitions 8 sort by(x) stored as kudu", "SORT BY column list must
not " +
              :         "contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, primary key (x))
partition " +
              :         "by hash(x) partitions 8 sort by(x) stored as kudu", "SORT BY column
list must " +
              :         "not contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, primary key(x, y))
partition " +
              :         "by hash(y) partitions 8 sort by (i,x) stored as kudu", "SORT BY column
list " +
              :         "must not contain primary key column: 'x'");
              :   }
For Kudu specific tests they need to be in a function that first calls TestUtils.assumeKuduIsSupported()
(see L1930). Do you have tests for HBase?
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
Add one with a partitioned table? Also, a CTAS with a more complex select (joins, maybe aggs)
File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

Out of curiosity? Can we change the default behavior? If no, are we ok with this restriction?

            : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)]
            : |  partitions=24
            : |
            : 01:SORT
            : |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST,
            : |
            : 00:SCAN HDFS [functional.alltypes]
            :    partitions=24/24 files=24 size=478.45KB
Do we need all these DISTRIBUTED_PLANs?

PS5, Line 193: ====
Can you add one or two more complex select statements feeding the insert? i.e. something with
a join that generates more complex plans
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:                  
where are these empty spaces coming from?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-HasComments: Yes

View raw message