impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Date Tue, 13 Jun 2017 01:30:51 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
......................................................................


Patch Set 10:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS10, Line 120: Exceptions are the page we're currently writing
why is that still an exception now that we have the saved reservation?


PS10, Line 271: ExpectedPinCount
Now that the complex logic shifts to reservations, should this become "bool NeedPin()"? I'm
fine with leaving it alone though.


PS10, Line 340: has_write_iterator
why is that? should comment say: Need reservation if there are no pages current pinned for
reading but we may add a page, or similar?


Line 409:           - write_page_reservation_to_reclaim)) {
this can only fail for large pages, right? if so, how about clarifying with a dcheck?


Line 424:   // free up reservation for the next write page, so do it first.
then why does this not have to happen before the IncreaseReservationToFit() call above?


PS10, Line 505: deleting or unpinning the previous page
this seems like it needs updating.


PS10, Line 516: (!pinned_ || pages_.size() == 1) && has_read_write_page()
hmm, is this not expressible by NeedWriteReservation()?


PS10, Line 548: InvalidateReadIterator
why do that? don't we still need the saved read reservation?


Line 566:     read_end_ptr_ = nullptr;
why doesn't this case need to save reservation?


PS10, Line 713: read_page_ == pages_.end()
why do we do NextReadPage() if we're already at the end?


PS10, Line 866: so
              :   // we can just use AllocateRowSlow() to do the work of advancing the page.
i don't understand why this follows from the first part of the sentence.


PS10, Line 893: pinned_
shouldn't that be expressed in terms of NeedWriteReservation()?


http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 77: In the unpinned mode, only
             : /// pages currently being read and written are pinned and other pages are unpinned
and
             : /// therefore do not use the client's reservation and can be spilled to disk.
             : ///
it seems like this sentence should be combined with the next paragraph, since it's no longer
complete.


PS10, Line 89: caller
client?


Line 278:   /// but std::function always makes a heap allocation.
explain what this callback should do.


PS10, Line 281: 'fixed_size' and variable
              :   /// length data 'varlen_size'
update


Line 283:   /// start of the allocated space and returns true. Otherwise returns false.
does this have the same three outcomes as AddRow?


PS10, Line 284: AllocateRow
At this point the "Add" and "Allocate" row terminology seems historical. Let's come up with
better names for these. They both allocate/add a row, the difference is just whether the tuple
row is passed and copied, or whether it's constructed on the fly, right?


Line 484:   /// * there is only one pinned page in the stream and it is already pinned for
reading.
this comment is useful, but it'd also be helpful to explain what these three conditions mean
in english like read_page_reservation_ comment does. i.e. "the possibility that a pin count
will be needed for the write iterator" or something.


PS10, Line 535: and at the byte before 'data_end'.
what does that mean?


PS10, Line 536: updates *data_end
data_end doesn't have enough indirection. is that suppose to say '*data'?


Line 582:   void InvalidateWriteIterator();
it's a bit hard to understand how these "invalidate" methods are different from the "Reset*Page"
methods from the description. in the case of Reset*Page(), what happens to the iterator? is
it not invalidated?


http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.inline.h
File be/src/runtime/buffered-tuple-stream-v2.inline.h:

Line 34: inline bool BufferedTupleStreamV2::AllocateRow(
why is it that AllocateRow() fast path is inlined but AddRow() is not?


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message