impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
Date Fri, 07 Oct 2016 22:44:06 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
......................................................................


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG
Commit Message:

PS1, Line 7:  IMPALA-3686
> wrong JIRA number: IMPALA-3638
I am not 100% sure the extent which IMPALA-3638 is trying to cover so I am fine if you think
a new bug is better. Either way, this patch alleviates the issue mentioned in IMPALA-3638
as we will now codegen for ScalarFnCall even if there is no codegen enabled operator in the
fragment.

The caveat you mentioned is due to the fact that we don't have a way to invoke the JIT compiled
function in the interpretation path of Expr except for ScalarFnCall. Once we fix that, then
codegen'ing all the Exprs makes sense. That may also explain the oddness of ScalarFnCall::Prepare()
(i.e. IMPALA-4233)


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 168: 
> Unnecessary blank line.
Done


Line 174:   if (state->codegen_enabled()) {
> It would be great if we could avoid duplicating this check in every Codegen
Don't want to introduce another function per exec node just for that. I updated the patch
to add the string to runtime profile in Prepare() if codegen is disabled. That may fit in
well with what we want down the road if each fragment instance populates its pointers to compiled
functions at Prepare().


Line 833:   LlvmCodeGen* codegen = state->codegen();
> Maybe we should just pass the codegen object in as an argument?
Not so sure yet as we may need to access the Expr stored in RuntimeState (eventually) so I
prefer to keep it as is for now as the code is still in flux.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 295:   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
> Extra blank line.
Done


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
> IMO we should be passing in the codegen object rather than the state in mos
Not so sure about this yet as we may need RuntimeState for accessing Expr in the future.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, llvm::Function** fn) {
> I feel like this should take codegen as an argument instead of state.
Yes, this should take codegen as argument. I have converted all GetCodendComputeFn().


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS1, Line 98: UNLIKELY
> UNLIKELY seems unnecessary since this isn't a hot path.
It's part of the Prepare() path which may have impact to the latency of propagation of runtime
filters. That said, the UNLIKELY here is gonna make negligible difference.


Line 99:       return Status(Substitute("Cannot handle IR UDF '$0': ", fn_.name.function_name,
> This seems like something that would break queries that previously worked -
As we discussed offline, this is not a well documented corner case. In some sense, we can
consider this as a bug as this code path doesn't honor the query option "DISABLE_CODEGEN".
The old code will fall back to codegen (even if it's disabled) if we see IR UDF or if the
number of arguments is more than 8. The new patch will simply fail the query.

Also added some tests for this path.


Line 105:     DCHECK(NumFixedArgs() <= 8);
> How do we avoid hitting this DCHECK? Do we have test coverage for this?
Good point. Converted this to a if-check instead. Will add a test to test_udfs.py to make
sure we see appropriate error messages if the native UDF has too many arguments.

I checked the FE built-in library and as far as I can tell, we don't have any built-in UDFs
with more than 8 arguments.

Added a test for this.


Line 136:     // TODO: don't do this for child exprs
> This is an interesting point that we should make sure we fix when we clean 
Not sure what that means ? Does it mean we shouldn't codegen child exprs ? However, will it
be possible to inline all children expr without codegening them too ?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 194:   plan_->Codegen(state);
> Should we do this after we prepared the sink, instead of in the middle of t
This is just a temporary step. Eventually, codegen will happen before any plan fragment is
instantiated (i.e. Prepare() or anything is called).

However, it shouldn't hurt to move it to the point after data sink is prepared. Will update
the patch to do so.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 257: it has already been called.
> No-op if the codegen object has already been created.
Done


PS1, Line 257: codegen_
> maybe don't mention internal state. Instead something like "Create a codege
Done


PS1, Line 257:  This is
             :   /// created on first use.
> Not sure if the last sentence adds much.
Done


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 89:   // Codegen is disabled unless the expression contains external UDF available only
> This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fa
This is to avoid breaking compatibility. Previously, this expression evaluation with IR-only
UDF would have worked. Apparently, the fe-support code just takes any failure as an exception
so it may lead to analysis failure for queries which worked previously.

As we discussed offline, we will honor the query option in query_ctx in this case. So, if
there is any IR UDF, we will enable codegen if the query option says so. We will disable codegen
for all other cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message