impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Date Thu, 15 Jun 2017 23:40:27 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6812/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 455:     // There are no materialized slots, e.g.
> There are no materialized slots and we are not optimizing count(*)
Done


Line 456:     // "select count(*) from alltypes where int_col > 5" and "select 1 from alltypes".
> The first query is not correct, in that case int_col is materialized. I thi
Done


http://gerrit.cloudera.org:8080/#/c/6812/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 171:       if (slot.getColumn() != null && !slot.getStats().hasStats()) return
true;
> Thanks. This was a bug in general, not really specific to your change.
Agreed.


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 105: from functional_parquet.alltypes
> single line query
Done


Line 150: # The optimization is disabled if the output of the count(*) inline view is being
> Optimization is not applied because the inner count(*) is not materialized.
Done


Line 285: # Optimization is not applied when selecting from an empty table.
> Do we apply the optimization when we reference a non-empty partitioned tabl
We do not. Added a test case for this.


Line 323: # materialized. Not materialized agg exprs are ignored.
> Non-materialized agg exprs are ignored.
Done


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test:

Line 86: # Verify that 0 is returned when we are selecting from an empty table and the optimization
> the optimization is not applied in this case right?
Yes, that's true, but it *could* be applied. Reworded slightly, still not sure if the phrasing
makes sense now.


Line 95: # Verify that 0 is returned when all the partitioned columns are filtered.
> when all partitions are pruned
Done


http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

Line 275:     if (vector.get_value('table_format').file_format != 'text' or
> seems weird, why not
if we set it to parquet then this test will not run as part of core at all. It will only run
as part of exhaustive. Is that what we want?


Line 280:     vector.get_value('exec_option')['batch_size'] = 1
> Doesn't exhaustive run this test with multiple batch sizes already? If so, 
No, I checked, batch size does not vary for this test in exhaustive


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message