impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Date Tue, 22 Aug 2017 05:10:16 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables

Patch Set 5:

File fe/src/main/java/org/apache/impala/planner/

Line 219:   public boolean hasPushedConjuncts() {
> Makes sense, but as Matt pointed out earlier in a discussion that having si
Fair point. To me it's not confusing, but I'll defer to MJ. Instead of adding to the conjunct
confusion, I suggest we override getInputCardinality() in every scan node. The ScanNode already
overrides the default implementation in PlanNode, so the change would only continue the specialization

I understand that we also use this code in the max rows visitor, but I'm not sure the use
makes sense there (see my comment in the other file).
File fe/src/main/java/org/apache/impala/util/

Line 57:               !scan.hasPushedConjuncts()))) {
MJ, does this change even make sense? Suppose we have a query with a limit and only Kudu conjuncts.
From Impala's point of view the input cardinality is still the limit, so why run the scan
on multiple impalads? Will Kudu be faster in scanning if we query it with multiple impalads?
(Similar question for the datasource scan)
File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

Line 132: ---- QUERY
> This code change has no effect on the plan created since for a data source 
I don't think that's true. This is what I get:

[localhost:21000] > explain select * from alltypes_datasource;
Query: explain select * from alltypes_datasource
| Explain String                                                                     |
| Max Per-Host Resource Reservation: Memory=0B                                       |
| Per-Host Resource Estimates: Memory=1.00GB                                         |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| functional.alltypes_datasource                                                     |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN DATA SOURCE [functional.alltypes_datasource]                               |

With the single-node optimization we should not have an exchange.
File tests/query_test/

Line 1010:     """IMPALA-5602: Test that 'small query' optimization is not used if table stats
> Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs 
I understand. My question is whether you have tried extending addTestTable() to work for this
use case. Extending our FE unit testing capabilities is preferable to adding a one-off EE

If the extension is too hard, that's fine, but I'm not yet convinced that it is.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bikramjeet Vig <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message