impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
Date Fri, 20 Oct 2017 17:56:43 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

Patch Set 8:

File be/src/util/bit-packing.h:
PS8, Line 67:   static std::pair<const uint8_t*, int64_t> UnpackValues(const uint8_t*
__restrict__ in,
> Could this one be private?
I feel like I created an artificial distinction by making some methods private. I don't see
any reason why most of the private methods shouldn't be public - this isn't a stateful object
so it's not possible to violate any internal invariants by using the private functions.

Maybe I should just make them all public except for NumValuesToUnpack(), which is a helper
function? I did that in the current patchset and improved the documentation for the newly
public functions.
PS8, Line 81:   static std::pair<const uint8_t*, int64_t> UnpackAndDecodeValues(
> Could this one be private?
See above comment.
PS8, Line 86:   /// Unpack exactly 32 values of 'bit_width' from 'in' to 'out'. 'in' must
point to
> Is there a benefit of having those public over making them private and make
See above comment
PS8, Line 114: 'bit_width'
> Is this comment still correct? From looking at the implementation I also th
The bit about 'bit_width' is incorrect, that version of the function no longer exists.

You're right that it's really up to 31 values. Fixed the function name and comment.
File be/src/util/bit-packing.inline.h:
PS8, Line 56:     return UnpackValues<OutType, i>(in, in_bytes, num_values, out);
> nit: I think adding a blank line between the definition of the macro and ca
I restructured it a bit more to separate out the macro definition from the switch. Agree that
it wasn't that readable as it - not sure why I structured it like that in the first place.
File be/src/util/bit-stream-utils.h:
PS8, Line 33: /// TODO: replace uses with the more efficient BatchedBitReader.
> Would that mean replacing the Writer with the BatchedReader?
Oops, no, I meant this comment to apply to the old BitReader when I was partway through this
patch, but I ended up removing it entirely and missed the comment.
File be/src/util/bit-stream-utils.inline.h:
PS8, Line 115:   auto result = BitPacking::UnpackAndDecodeValues(bit_width, buffer_pos_,
> You could use std::tie to unpack the result.
Seems to work, thanks for the tip.
File be/src/util/rle-encoding.h:
PS8, Line 93:   uint32_t NextNumRepeats();
> Could we convert all the uint32_t to int64_t while we're here? There's alre
It looks like the counts could actually be int32_t since that's what they're decoded as. Not
sure why uint32_t was used here. I could make them int64_t but I think there are downsides
to that too - it forces all callers to use wider integer types than necessary.
PS8, Line 105:   /// copying the values to 'values'.
> Can you add a comment on the return value, here and below?
PS8, Line 134: decoded
> nit: decode
PS8, Line 137:   static constexpr int LITERAL_BUFFER_LEN = 32;
> out of curiosity, does this need constexpr instead of const? The latter sho
No it doesn't - I believe they're equivalent for numeric values. I did miss defining the constant
so did that at the bottom of the file.
PS8, Line 145: exhaustive
> nit: exhausted?
PS8, Line 153:   bool FillLiteralBuffer() WARN_UNUSED_RESULT;
> Do we give any guarantees what the caller can do after this returns false? 
I extended the class comment to describe how it should be used, including error handling.
We don't provide any guarantees about the state after it returns false.
PS8, Line 159:   /// Output buffered literals and decrement 'literal_count_'.
> Should we mention that it also increments 'literal_buffer_pos_'?
PS8, Line 494: boundery
> nit: typo
PS8, Line 498:     int num_read = bit_reader_.UnpackBatch(bit_width_, num_to_bypass, values
+ num_consumed);
> nit: long line
PS8, Line 585:     if (UNLIKELY(literal_count_ == 0)) return;
> This line doesn't seem to have any effect? Is it there for perf reasons? If
This was left in by mistake when the code got moved around.
File testdata/data/README:
PS7, Line 121: write
> nit: writer
File testdata/data/bitpacked_def_levels.patch:
PS7, Line 1: 
This won't pass RAT check.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Fri, 20 Oct 2017 17:56:43 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message