impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Date Thu, 20 Oct 2016 23:22:44 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/codegen/codegen-anyval.h
File be/src/codegen/codegen-anyval.h:

PS4, Line 242: if (val.is_null) goto null_block;
Can you please elaborate this comment with code snippet like the following ?

if (val.is_null) goto null_block;

non_null_block:
   //default entry point after calling this function

null_block:
....


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS4, Line 1065: can ignore the NULL bit of its destination value is
              :     // ignored
?


PS4, Line 1067: set
did you mean with NULL bit cleared ?


Line 1599:     RETURN_IF_ERROR(agg_expr->GetCodegendComputeFn(state_, &agg_expr_fn));
DCHECK(agg_expr_fn != NULL);


PS4, Line 1620: src.CodegenBranchIfNull(&builder, ret_block);
It seems that we emit this check for all paths except for the UDA path.  Why is it safe to
make the assumption that 'src' is not NULL for the UDA path ?

Also, it appears that we may consider doing some refactoring by computing 'bool use_uda_interface'
up front and only emit this cmp-and-branch if that's false. Please see comments below about
'use_uda_interface'.


PS4, Line 1628: dst_type.type != TYPE_TIMESTAMP
              :       && !dst_type.IsStringType()
This seems to be reused in multiple places. May be it's worth factoring this expression out.


PS4, Line 1637: dst_type.type != TYPE_TIMESTAMP
              :       && !dst_type.IsStringType()
It may be easier to read if we do the check once up front:

agg_op = evaluator->agg_op();
bool use_uda_interface  = agg_op != COUNT &&
                                          (dst_type.type == TYPE_TIMESTAMP || 
                                           dst_type.IsStringType ||
                                           (dst_type.type == TYPE_DECIMAL &&
                                          agg_op() == AggFnEvaluator::SUM);
if (use_uda_interface) {
  ....
} else {
    switch (agg_op) {

    }
}


PS4, Line 1676: src.CodegenBranchIfNull(&builder, ret_block);
Actually, referring to the comment above again, it seems that we should just do the check
before the if-stmt sequence and emit the cmp-and-branch instructions up front for these cases.

IMHO, I find the new code harder to follow because the basic blocks in the emitted code are
not as explicitly laid out as the old code.


PS4, Line 1685: resulting in
with results stored in


PS4, Line 1725: codegen->GetFunction(symbol);
Should we consider adding "bool clone" as the second argument similar to the other variant
of GetFunction() ?


PS4, Line 1745:     Value* input_lowered_ptr =
              :         codegen->CreateEntryBlockAlloca(builder->GetInsertBlock()->getParent(),
              :             LlvmCodeGen::NamedVariable("input_lowered_ptr",
              :                                             input_vals[i].value()->getType()));
              :     builder->CreateStore(input_vals[i].value(), input_lowered_ptr);
              :     Type* input_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType(
              :         codegen, evaluator->input_expr_ctxs()[i]->root()->type());
              :     Value* input_unlowered_ptr = builder->CreateBitCast(
              :         input_lowered_ptr, input_unlowered_ptr_type, "input_unlowered_ptr");
Do you think it's worth factoring this logic out to a function given it's repeated again from
line 1757-1764 ?


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exprs/compound-predicates.cc
File be/src/exprs/compound-predicates.cc:

PS4, Line 68:  
Please consider removing these extra blank spaces too.


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS4, Line 603:   Value* tuple_bytes = builder->CreateBitCast(tuple, codegen->ptr_type());
             :   Constant* null_byte_offset =
             :       ConstantInt::get(codegen->int_type(), null_indicator_offset_.byte_offset);
             :   Value* null_byte_ptr =
             :       builder->CreateInBoundsGEP(tuple_bytes, null_byte_offset, "null_byte_ptr");
What do you think about factoring this code snippet as a utility function for generating code
to return null_byte_ptr ? This seems to be useful for CodegenIsNull() too.


-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message