impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4123: Fast bit unpacking
Date Fri, 07 Oct 2016 18:07:29 GMT
Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
......................................................................


Patch Set 7:

(20 comments)

Thanks for your patience

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() as the last
argument to the params initializer list.


PS7, Line 343:     suite.AddBenchmark(Substitute("UnpackScalar", bit_width), UnpackBenchmark,
&params);
This line at bit_width 32 is not in the output you display in the comment at the top.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 35:  a subset of
            : /// 'num_in_values'
I still find this wording confusing - Which is the to buffer, and which the from? What does
it mean to have a subset of an int?


PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 64?


PS7, Line 49: uint32_t
const


Line 93:     uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, int misalignment)
{
const T*


PS7, Line 94: int
const


PS7, Line 99: vector
std::aligned_storage?


PS7, Line 102:  sizeof(uint32_t)
Why is this needed?


PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, right?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS7, Line 30: values
'value'


PS7, Line 67:  bits of data
You can elide this phrase


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125:     const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> That's true, but there's a downside - if the loads are aligned relative to 
I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To unpack 16 of them
with your method, you need to read 160 bits, which means 5 (aligned) loads. 12 of the unpacked
values can be read without using this third (more expensive) branch. In this branch, there
are two more ALU ops, so that's 4*2 = 8 extra ALU ops.

OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, and 80) and
no more than two ALU ops for each extraction.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 57: const
constexpr BATCH_SIZE


PS7, Line 69:     in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?


PS7, Line 91: should
Should this be "must", since  VALUE_IDX is now a template param? Not that the explanation
is wrong, but now you have forced the caller into efficient behavior with your choice.


PS7, Line 178: const
constexpr MAX_BATCH_SIZE


PS7, Line 179: int
const


PS7, Line 183:   // Copy into padded temporary buffer to avoid reading past the end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values is 16, can we
just use 'in'?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message