impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Date Tue, 25 Oct 2016 08:52:08 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

Patch Set 7:

File be/src/codegen/

PS7, Line 459: hat
File be/src/codegen/codegen-anyval.h:

Line 184:   /// lowered type. This *Val should be non-null.The output variable is called 'name'.
nit: space
File be/src/codegen/llvm-codegen.h:

PS7, Line 347: but we don't clone if we can avoid it to
             :   /// reduce compilation time.
Avoid cloning if possible to reduce compilation time.
File be/src/exec/

PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set)
> We could do that but it doesn't seem worth it (the interpreted path has eno
Sure. I agree that the interpreted path probably won't benefit much. My concern has more to
do with the explanation of the optimization and the assumption made here:

1. Who is initializing the slot to the proper value ?
2. How are these initialized values different for different aggregate functions ?
3. Will the NULL bits be changed after initialization ? If so, by whom ?
4. And why is it okay to not check the NULL bits after calling the UDA function for certain
aggregate functions ? Is it because they cannot change or is it already handled somewhere
else ?
File be/src/exec/

PS7, Line 1660:  Second,
              :     // the NULL bit of the dst slot can be ignored if the slot is initialised
in the
              :     // right way (see InitAggSlots()). Empirically this optimisation makes
              :     // 5-10% faster.
Please also see comments above.

It's not very clear what it means by "the NULL bit of the dst slot" can be ignored ? Is it
more precise to say their state cannot change after initialization or they will be updated
appropriately by the update function ? It seems different aggregate function (e.g. min vs
avg) has different default values with different conditions for the NULL bit to be set. May
help to document a bit more details.

PS7, Line 1675: dst.SetIsNull(slot_desc->CodegenIsNull(codegen, &builder, agg_tuple_arg));
Why is this not in the old code or was it done somewhere else ?
File be/src/exec/partitioned-aggregation-node.h:

PS7, Line 649: 'dst_val' should contain the previous value of the aggregate
             :   /// function, and 'result_val' is set to the new value after the Update or
             :   /// operation is applied.
Just curious why we separate 'dst_val' and 'result_val'. Doesn't 'result_val' subsume 'dst_val'
File be/src/exprs/

PS7, Line 496:  false
Please feel free to ignore but you can consider setting this second argument to default value
of false. However, keeping it as is may be clearer (and you have gone through the trouble
of updating all call sites - thanks !).

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message