impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Date Wed, 14 Jun 2017 00:53:57 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2

Patch Set 10:

File be/src/runtime/

PS10, Line 120: Exceptions are the page we're currently writing
> why is that still an exception now that we have the saved reservation?
You're right. For some reason I thought I fixed this code but I obviously didn't.

PS10, Line 271: ExpectedPinCount
> Now that the complex logic shifts to reservations, should this become "bool
This seemed to work out a bit cleaner for the validations (i.e. we can just assert the pin
count equals this, rather than asserting is_pinned == NeedPin() && pin_count == 1.

PS10, Line 340: has_write_iterator
> why is that? should comment say: Need reservation if there are no pages cur

Line 409:           - write_page_reservation_to_reclaim)) {
> this can only fail for large pages, right? if so, how about clarifying with
Or for pinned streams. I added the DCHECK.

Line 424:   // free up reservation for the next write page, so do it first.
> then why does this not have to happen before the IncreaseReservationToFit()
It's not necessary now that we factor that into the IncreaseReservationToFit() calculation.
Removed the comment.

PS10, Line 505: deleting or unpinning the previous page
> this seems like it needs updating.

PS10, Line 516: (!pinned_ || pages_.size() == 1) && has_read_write_page()
> hmm, is this not expressible by NeedWriteReservation()?
Done. It's a little clunky but it is probably clearer.

PS10, Line 548: InvalidateReadIterator
> why do that? don't we still need the saved read reservation?
In case there was already a read iterator active. This just resets things to a clear state
then sets up the new iterator from scratch.

Line 566:     read_end_ptr_ = nullptr;
> why doesn't this case need to save reservation?
It's handled in PrepareForReadWrite(). It's unclear whether it's cleaner to handle this here
or in the caller. The current approach is slightly less code.

PS10, Line 713: read_page_ == pages_.end()
> why do we do NextReadPage() if we're already at the end?
pages_.end() is just the general-purpose invalid iterator value. The added case handles moving
to the first page of a read/write stream.

PS10, Line 866: so
              :   // we can just use AllocateRowSlow() to do the work of advancing the page.
> i don't understand why this follows from the first part of the sentence.
I don't think it was very helpful. Shortened the comment.

PS10, Line 893: pinned_
> shouldn't that be expressed in terms of NeedWriteReservation()?
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 77: In the unpinned mode, only
             : /// pages currently being read and written are pinned and other pages are unpinned
             : /// therefore do not use the client's reservation and can be spilled to disk.
             : ///
> it seems like this sentence should be combined with the next paragraph, sin
Merged the paragraphs.

PS10, Line 89: caller
> client?

Line 278:   /// but std::function always makes a heap allocation.
> explain what this callback should do.

PS10, Line 281: 'fixed_size' and variable
              :   /// length data 'varlen_size'
> update

Line 283:   /// start of the allocated space and returns true. Otherwise returns false.
> does this have the same three outcomes as AddRow?

PS10, Line 284: AllocateRow
> At this point the "Add" and "Allocate" row terminology seems historical. Le
AddRow() still seems reasonable. AddRowCustom(), AddRowUnsafe()?

Line 484:   /// * there is only one pinned page in the stream and it is already pinned for
> this comment is useful, but it'd also be helpful to explain what these thre

PS10, Line 535: and at the byte before 'data_end'.
> what does that mean?
was missing "ending"

PS10, Line 536: updates *data_end
> data_end doesn't have enough indirection. is that suppose to say '*data'?

Line 582:   void InvalidateWriteIterator();
> it's a bit hard to understand how these "invalidate" methods are different 
Folded ResetReadPage() into InvalidateReadIterator(). Updated the ResetWritePage() comment
to reflect the logical position of the iterator after it returns.
File be/src/runtime/buffered-tuple-stream-v2.inline.h:

Line 34: inline bool BufferedTupleStreamV2::AllocateRow(
> why is it that AllocateRow() fast path is inlined but AddRow() is not?
AddRow() calls into the interpreted DeepCopy() anyway on the fast path so there's not really
a benefit, whereas AllocateRow() doesn't do as much work and benefits from inlining write_fn()

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message