impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Date Fri, 30 Jun 2017 05:19:43 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 886:         prefix + "8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_",
> You'd have to replace the uses of the agg slot with the zeroifnull() expres
Taras, I think the rewrite solution becomes more viable if we follow the approach where the
AggInfo is passed directly into the HdfsScanNode. The scan can create an smap with two entries:

count(*) -> sum(num_rows_slot)
slotref -> zeroifnull(slotref)

where the slotref of the second entry is the agg slot from the first-level aggregation corresponding
to count(*).

Once we return fro init() from the scan node, we apply the agg optimization smap to all the
AggInfos (local and merge).


http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 109:  *
> This comment doesn't explain the overall optimization. What I'm looking for
Taras, I think Dan is looking for a comment along the lines of what we have in applyParquetCountStartOptimization().
Maybe we can add a condensed version of that somewhere.

Dan, where do you expect this comment? Here? in SingleNodePlanner.createSelectPlan()? Somewhere
else?


PS5, Line 140:  This
             :   // scan does additional analysis in init() to determine whether it is correct
to apply
             :   // the optimization.
> Okay. If it doesn't work out, I just think the comments needs to be clarifi
I think this approach will work out. We can use Analyzer.tableRefMap_ to determine how many
table refs are in a query block.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message