impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4080, IMPALA-3686: Introduce ExecNode::Codegen()
Date Thu, 06 Oct 2016 21:41:03 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3686: Introduce ExecNode::Codegen()
......................................................................


Patch Set 1:

(16 comments)

The overall cleanup looks good. 

Do we have any rough numbers on how long it takes to codegen a few exprs, since this will
likely enable codegen for exprs where it didn't previously? Would be helpful to understand
the potential performance impact.

http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG
Commit Message:

PS1, Line 7:  IMPALA-3686
wrong JIRA number: IMPALA-3638

Also it's not clear to me that this fully resolves the limitations with Expr codegen that
the JIRA was sort-of tracking. The other big caveat was that only subtrees with ScalarFnCall
at the root are codegen'd. 

Maybe we should just file another JIRA for that issue? Potentially it could be worth fixing
it while we're touching the code, since it would be a similar transformation of how Codegen
works in Exprs, e.g. having a Expr::Codegen() function that calls GetCodegendComputeFunction().


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 168: 
Unnecessary blank line.


Line 174:   if (state->codegen_enabled()) {
It would be great if we could avoid duplicating this check in every Codegen() method. Maybe
we should turn the logic around so that we only call Codegen() on each ExecNode if codegen
is enabled. It looks like if it's disabled we should just add the exec option "Codegen Disabled
by Query Option" or something along those lines.

E.g. we could have a ExecNode::CodegenTree() function that walks the tree and either calls
Codegen() on each node or add the exec option.


Line 833:   LlvmCodeGen* codegen = state->codegen();
Maybe we should just pass the codegen object in as an argument?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 295:   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
Extra blank line.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
IMO we should be passing in the codegen object rather than the state in most (all?) of these
cases.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, llvm::Function** fn) {
I feel like this should take codegen as an argument instead of state.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS1, Line 98: UNLIKELY
UNLIKELY seems unnecessary since this isn't a hot path.


Line 99:       return Status(Substitute("Cannot handle IR UDF '$0': ", fn_.name.function_name,
This seems like something that would break queries that previously worked - should call it
out in the commit message at least. I think running IR UDFs with DISABLE_CODEGEN is a weird
enough corner case that it should be ok to break it.

We should also make sure that we don't break the IR UDF tests on exhaustive in case they run
with that combination of ExecOptions.


Line 105:     DCHECK(NumFixedArgs() <= 8);
How do we avoid hitting this DCHECK? Do we have test coverage for this?


Line 136:     // TODO: don't do this for child exprs
This is an interesting point that we should make sure we fix when we clean up expr codegen.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 194:   plan_->Codegen(state);
Should we do this after we prepared the sink, instead of in the middle of the prepare phase?

Also, should we add a Codegen method to the DataSink interface now so that we have that interface
defined too? Or is that a follow-up patch?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 257: it has already been called.
No-op if the codegen object has already been created.


PS1, Line 257: codegen_
maybe don't mention internal state. Instead something like "Create a codegen object that can
be accessed via codegen()"


PS1, Line 257:  This is
             :   /// created on first use.
Not sure if the last sentence adds much.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 89:   // Codegen is disabled unless the expression contains external UDF available only
This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fail the query at
this point if codegen is disabled rather than forcing it on?


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message