impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
Date Fri, 20 Oct 2017 17:56:43 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8267 )

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
......................................................................


Patch Set 8:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@67
PS8, Line 67:   static std::pair<const uint8_t*, int64_t> UnpackValues(const uint8_t*
__restrict__ in,
> Could this one be private?
I feel like I created an artificial distinction by making some methods private. I don't see
any reason why most of the private methods shouldn't be public - this isn't a stateful object
so it's not possible to violate any internal invariants by using the private functions.

Maybe I should just make them all public except for NumValuesToUnpack(), which is a helper
function? I did that in the current patchset and improved the documentation for the newly
public functions.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@81
PS8, Line 81:   static std::pair<const uint8_t*, int64_t> UnpackAndDecodeValues(
> Could this one be private?
See above comment.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@86
PS8, Line 86:   /// Unpack exactly 32 values of 'bit_width' from 'in' to 'out'. 'in' must
point to
> Is there a benefit of having those public over making them private and make
See above comment


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.h@114
PS8, Line 114: 'bit_width'
> Is this comment still correct? From looking at the implementation I also th
The bit about 'bit_width' is incorrect, that version of the function no longer exists.

You're right that it's really up to 31 values. Fixed the function name and comment.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-packing.inline.h@56
PS8, Line 56:     return UnpackValues<OutType, i>(in, in_bytes, num_values, out);
> nit: I think adding a blank line between the definition of the macro and ca
I restructured it a bit more to separate out the macro definition from the switch. Agree that
it wasn't that readable as it - not sure why I structured it like that in the first place.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h
File be/src/util/bit-stream-utils.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.h@33
PS8, Line 33: /// TODO: replace uses with the more efficient BatchedBitReader.
> Would that mean replacing the Writer with the BatchedReader?
Oops, no, I meant this comment to apply to the old BitReader when I was partway through this
patch, but I ended up removing it entirely and missed the comment.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h
File be/src/util/bit-stream-utils.inline.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/bit-stream-utils.inline.h@115
PS8, Line 115:   auto result = BitPacking::UnpackAndDecodeValues(bit_width, buffer_pos_,
> You could use std::tie to unpack the result.
Seems to work, thanks for the tip.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@93
PS8, Line 93:   uint32_t NextNumRepeats();
> Could we convert all the uint32_t to int64_t while we're here? There's alre
It looks like the counts could actually be int32_t since that's what they're decoded as. Not
sure why uint32_t was used here. I could make them int64_t but I think there are downsides
to that too - it forces all callers to use wider integer types than necessary.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@105
PS8, Line 105:   /// copying the values to 'values'.
> Can you add a comment on the return value, here and below?
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@134
PS8, Line 134: decoded
> nit: decode
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@137
PS8, Line 137:   static constexpr int LITERAL_BUFFER_LEN = 32;
> out of curiosity, does this need constexpr instead of const? The latter sho
No it doesn't - I believe they're equivalent for numeric values. I did miss defining the constant
so did that at the bottom of the file.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@145
PS8, Line 145: exhaustive
> nit: exhausted?
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@153
PS8, Line 153:   bool FillLiteralBuffer() WARN_UNUSED_RESULT;
> Do we give any guarantees what the caller can do after this returns false? 
I extended the class comment to describe how it should be used, including error handling.
We don't provide any guarantees about the state after it returns false.


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@159
PS8, Line 159:   /// Output buffered literals and decrement 'literal_count_'.
> Should we mention that it also increments 'literal_buffer_pos_'?
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@494
PS8, Line 494: boundery
> nit: typo
oops


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@498
PS8, Line 498:     int num_read = bit_reader_.UnpackBatch(bit_width_, num_to_bypass, values
+ num_consumed);
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/8267/8/be/src/util/rle-encoding.h@585
PS8, Line 585:     if (UNLIKELY(literal_count_ == 0)) return;
> This line doesn't seem to have any effect? Is it there for perf reasons? If
This was left in by mistake when the code got moved around.


http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/README@121
PS7, Line 121: write
> nit: writer
Done


http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/bitpacked_def_levels.patch
File testdata/data/bitpacked_def_levels.patch:

http://gerrit.cloudera.org:8080/#/c/8267/7/testdata/data/bitpacked_def_levels.patch@1
PS7, Line 1: 
This won't pass RAT check.



-- 
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 17:56:43 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message