impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Date Thu, 17 Nov 2016 17:00:53 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> I agree that it's more actionable if it's based on number of tuples.
This is more about avoiding codegen blow-ups instead of making a cost-based decision (that's
tricky because the codegen cost is non-linear and unpredictable and we'd need to build a cost
model for the scanners factoring in # rows,# columns, predicates, etc).

I don't think there's much point in modelling the cost unless a) there is a substantial benefit
to justify the complexity and b) the cost model is going to be fairly accurate. Otherwise
we end up in an uncomfortable middle ground with a half-baked code model where the code is
more complex but we don't get measurable benefit or have confidence in the cost model's accuracy.

I'd rather not do this from the planner because it's tied up in the implementation details
of codegen for the scanners. The only reason we have a hard threshold is because the backend
codegen logic builds everything in a single function. If that changed we could switch to the
approach we use elsewhere of disabling inlining. It makes more sense to me for the planner
to make a high-level decision about enabling codegen for the node, then the backend implements
that decision (or bails out if it can't do the codegen without blowing up).


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen
I think it's different from the agg node. For the agg node we know based on the plan shape
that the number of input rows is small, so even though we could codegen the agg, it's not
worth it. But as far as I can see the only reason we don't codegen non-merging exchanges in
general is that there's no backend support for it.

There's no reason I see that the planner would always disable codegen for non-merging exchanges
if backend support existed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message