Jim Apple has posted comments on this change.
Change subject: IMPALA4123: Fast bit unpacking
......................................................................
Patch Set 7:
(20 comments)
Thanks for your patience
http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bitpackingbenchmark.cc
File be/src/benchmarks/bitpackingbenchmark.cc:
PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() as the last
argument to the params initializer list.
PS7, Line 343: suite.AddBenchmark(Substitute("UnpackScalar", bit_width), UnpackBenchmark,
¶ms);
This line at bit_width 32 is not in the output you display in the comment at the top.
http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bitpackingtest.cc
File be/src/util/bitpackingtest.cc:
PS7, Line 35: a subset of
: /// 'num_in_values'
I still find this wording confusing  Which is the to buffer, and which the from? What does
it mean to have a subset of an int?
PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 64?
PS7, Line 49: uint32_t
const
Line 93: uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, int misalignment)
{
const T*
PS7, Line 94: int
const
PS7, Line 99: vector
std::aligned_storage?
PS7, Line 102: sizeof(uint32_t)
Why is this needed?
PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?
http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bitpacking.h
File be/src/util/bitpacking.h:
PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, right?
http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bitpacking.h
File be/src/util/bitpacking.h:
PS7, Line 30: values
'value'
PS7, Line 67: bits of data
You can elide this phrase
http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bitpacking.inline.h
File be/src/util/bitpacking.inline.h:
PS4, Line 125: const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] %
> That's true, but there's a downside  if the loads are aligned relative to
I'm not so sure. Let's consider reading 10 bit ints using 32bit loads. To unpack 16 of them
with your method, you need to read 160 bits, which means 5 (aligned) loads. 12 of the unpacked
values can be read without using this third (more expensive) branch. In this branch, there
are two more ALU ops, so that's 4*2 = 8 extra ALU ops.
OTOH, using unaligned loads, you could do only 4 loads and (at bits 0, 24, 48, and 80) and
no more than two ALU ops for each extraction.
http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bitpacking.inline.h
File be/src/util/bitpacking.inline.h:
PS7, Line 57: const
constexpr BATCH_SIZE
PS7, Line 69: in_bytes = batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?
PS7, Line 91: should
Should this be "must", since VALUE_IDX is now a template param? Not that the explanation
is wrong, but now you have forced the caller into efficient behavior with your choice.
PS7, Line 178: const
constexpr MAX_BATCH_SIZE
PS7, Line 179: int
const
PS7, Line 183: // Copy into padded temporary buffer to avoid reading past the end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values is 16, can we
just use 'in'?

To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
GerritMessageType: comment
GerritChangeId: I12db69409483d208cd4c0f41c27a78aeb6cd3622
GerritPatchSet: 7
GerritProject: ImpalaASF
GerritBranch: master
GerritOwner: Tim Armstrong <tarmstrong@cloudera.com>
GerritReviewer: Jim Apple <jbapple@cloudera.com>
GerritReviewer: Tim Armstrong <tarmstrong@cloudera.com>
GerritHasComments: Yes
