impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4123: Fast bit unpacking
Date Fri, 07 Oct 2016 18:07:29 GMT
Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking

Patch Set 7:


Thanks for your patience
File be/src/benchmarks/

PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() as the last
argument to the params initializer list.

PS7, Line 343:     suite.AddBenchmark(Substitute("UnpackScalar", bit_width), UnpackBenchmark,
This line at bit_width 32 is not in the output you display in the comment at the top.
File be/src/util/

PS7, Line 35:  a subset of
            : /// 'num_in_values'
I still find this wording confusing - Which is the to buffer, and which the from? What does
it mean to have a subset of an int?

PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 64?

PS7, Line 49: uint32_t

Line 93:     uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, int misalignment)
const T*

PS7, Line 94: int

PS7, Line 99: vector

PS7, Line 102:  sizeof(uint32_t)
Why is this needed?

PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, right?
File be/src/util/bit-packing.h:

PS7, Line 30: values

PS7, Line 67:  bits of data
You can elide this phrase
File be/src/util/bit-packing.inline.h:

PS4, Line 125:     const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> That's true, but there's a downside - if the loads are aligned relative to 
I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To unpack 16 of them
with your method, you need to read 160 bits, which means 5 (aligned) loads. 12 of the unpacked
values can be read without using this third (more expensive) branch. In this branch, there
are two more ALU ops, so that's 4*2 = 8 extra ALU ops.

OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, and 80) and
no more than two ALU ops for each extraction.
File be/src/util/bit-packing.inline.h:

PS7, Line 57: const
constexpr BATCH_SIZE

PS7, Line 69:     in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?

PS7, Line 91: should
Should this be "must", since  VALUE_IDX is now a template param? Not that the explanation
is wrong, but now you have forced the caller into efficient behavior with your choice.

PS7, Line 178: const
constexpr MAX_BATCH_SIZE

PS7, Line 179: int

PS7, Line 183:   // Copy into padded temporary buffer to avoid reading past the end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values is 16, can we
just use 'in'?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message