impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (Code Review)" <>
Subject [Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Date Sun, 05 Feb 2017 01:33:23 GMT
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan".

Patch Set 5: Code-Review+1


Carry Tim's +1.
File be/src/exec/

Line 275:       switch (slot_byte_size) {
> this doesn't seem necessary: we know that:
It is actually undefined to shift int32 by 32.

Added a check outside the switch; nice catch.
File be/src/exec/

PS4, Line 101: 
> what's undefined about that?
left shift of negative values is undefined.
File be/src/exprs/

PS4, Line 161: src.ptr != nullptr)
> dst->ptr equals copy, which we've already checked against NULL at line 154,
File be/src/exprs/expr-value.h:

PS4, Line 72: empty(
> size() != 0
File be/src/exprs/

Line 379:     }
> why don't we need to check overflow here? and then why do we need to check 
We don't need to check it here because the original didn't check it here: the Check on L373
is independent of the produc on L371.

The check on L367 of PS4 should not be there, I think. Removed.
File be/src/exprs/

Line 176: BINARY_OP_NUMERIC_TYPES(Subtract, -, Overflow::UnsignedDifference);
> for these, we eventually probably want to detect overflow and return NULL (
Noted. I'd like to leave that for a future patch.

PS4, Line 177: Overflow::UnsignedProduct
> won't this give the wrong result for negative inputs?
I think that signed two's complement and unsigned multiplications yield the same results on
a product of two N-bit numbers where the result is the low N bits. Concretely, I think the
following do the same thing:

    int32_t mul(int32_t x, int32_t y) { return x * y; }
    uint32_t mul(uint32_t x, uint32_t y) { return x * y; } compiles them to the same thing.
File be/src/exprs/

PS4, Line 819: 
> why remove that? do we sometimes pass null?
Yes: select split_part('', 'a', 1);
    F0129 14:38:29.681943 31628] Check failed: haystack != __null
&& needle != __null

Line 629:   if (strs[0].len) memcpy(ptr, strs[0].ptr, strs[0].len);
> != 0.
If either pointer is nullptr, it's UB.

It would be more standards-compliant to check both pointers. I can do that, rather than checking
just one length, but it would be more source code and likely slower.

What do you think?
File be/src/runtime/

PS4, Line 388: (
> these are extraneous
I think they make the code much more readable.
File be/src/runtime/decimal-value.inline.h:

PS4, Line 83: 
> not your change, but please fix this formatting.

PS4, Line 95:  |= abs(result) >= max_value / mult;
> can't result be negative, in which case this isn't correct.  also note that
See earlier comment about unsigned and signed-two's-complement multiplications being identical
on the low N bits.

Regarding the callers ignoring result if overflow is true: are you suggesting that this multiplication
not happen if overflow is already true?

Line 143:       // TODO: faster to return later? We don't care at all about the perf on the
> this comment seemed to imply that this change will make this code slower. i
There is no BE benchmark for this, as far as I can tell. I can add one, but this patch is
already quite large.

PS4, Line 149: 
> why do we need UnsignedSum() on line 144 case but not here?
File be/src/runtime/multi-precision.h:

Line 111:     scale = Overflow::UnsignedProduct(scale, base);
> given that callers won't actually use result if *overflow is true, do we ne
We still need it. The frustrating thing about UB is that it isn't just undefined results but
undefined behavior. There are cases where compilers will take both branches of a conditional,
for instance.

The change written here seems less invasive than changing 'result' and 'scale'. For instance,
scale can be negative, which influences line 107.
File be/src/runtime/

PS4, Line 975: string_val->ptr != nullptr
> is this actually possible? it's not checked upstream?
Yes; every check I added was because UBSAN found it dynamically.
File be/src/udf/

PS4, Line 493: !result
> len will almost always be > 0 so this condition should go second (if we nee
File be/src/util/

PS4, Line 118: input_len
> != 0 here and elsewhere. but why needed?
added '> 0'. Once we settle the best way to check for nulls in C standard library functions
with undefined semantics, I'll apply those here as needed. See comments earlier in this patch.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message