impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Date Tue, 07 Mar 2017 21:57:04 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Patch Set 8:

File be/src/runtime/

PS8, Line 108: if
> iff, just to be super clear.

PS8, Line 110: read_page_ != pages_.end()
> why is this condition needed? what does it mean to have read_page_ == pages
This means that there's a read iterator active.

read_page_ == pages_.end() && pinned_ means the stream is pinned but there is no read

I clarified in the comment of read_page_ and restructured this condition so the meanings of
subexpressions were more explicit.

Line 116:   }
> fine with me to keep, but isn't this the same as line 109?

Line 141:     ss << &*read_page_;
> will it be possible to identify which page in the list printed below this i
Yes, you can match up the addresses. I made this change to reduce the amount of redundancy
in the debug output.

Line 166:   DCHECK(read_page_ == pages_.end());
We don't generally use DCHECK_EQ for null pointer comparison (I did switch this to nullptr
for consistency though).

We also can't use it for iterators since they're not printable.

PS8, Line 232: read_page_ != pages_.end()
> what does this condition mean? why isn't it just: if the stream is pinned f
It means there's an active read iterator. I actually updated the has_read_page() method to
be called has_read_iterator() and used that instead - I think it's clear.

If there's only a write iterator we don't need to double-pin (because it's not really a read/write

I rewrote things in terms of the presence of read/write iterators instead of the read_write
mode flag. Now read_write_ mode only directly influences whether PrepareForRead() invalidates
the write iterator.

Line 241:     RETURN_IF_ERROR(buffer_pool_->Pin(buffer_pool_client_, &new_page.handle));
> does this purposely not go through PinPage()?
I was experimenting a bit with different flows here so that related operations were grouped
together. Seems like it was confusing so I switched back to PinPage().

PS8, Line 250: (double_pin ? 2 : 1)
> i guess because of this. maybe move this to be after the CreatePage() call 

Line 348:     *pinned = true;
> why was this needed?
Previously PinStream() was idempotent - I wanted to preserve that behaviour.

PS8, Line 356: We need to double-pin the current write page if we also have a read iterator.
> why? or maybe you mean just that we need to take a pin-count for reading ev
Tried to reword to be clearer.

PS8, Line 358: write_page_ == &page && read_page_ == pages_.end()
> why? i thought we now do take a separate pin count for read and for write?
This essentially implements that idea. Some of the complication is that on a non-read-write
stream where there is only ever one iterator active, we don't need to do the double-pinning
of the current write page.

Line 387:       if (pinned_) UnpinPage(&page);
> do we allow you to call UnpinStream() on an already unpinned stream? why?
I restructured this code to hopefully be clearer. As a bonus, calling UnpinStream() on an
already-unpinned stream is now O(1) instead of O(n)

PS8, Line 388: continue
> i think this would be clearer as:
I think the new version is easier to follow.
File be/src/runtime/buffered-tuple-stream-v2.h:

PS8, Line 68: .
> maybe add: "for reading"

PS8, Line 69: access
> read access?
We do update some tuples in place in the agg so that might be more misleading. I changed this
so that it's clearer that it means that pointers to rows can be saved (since we don't really
support random access via the stream interface).

PS8, Line 75: event
> even

PS8, Line 319: bytes_pinned
> i guess should be BytesPinned() since not a simple accessor

PS8, Line 385: including any pages already deleted in 'delete_on_read'
             :   /// mode.
> that's not intuitive. is there a good reason why we do that rather than mak
Currently those kind of invariants aren't maintained when reading in delete_on_read mode -
that has to be the last pass over the stream.  num_rows_ also isn't updated.

I don't see a use case for why client code would depend on these values being updated while
the stream is destructively read so the value of adding additional logic to keep this in sync
would be limited.

I already put some effort into updating the documentation and DCHECKs so that it was clearer
that delete_on_read was destructive and you couldn't re-read a stream afterwards.

PS8, Line 392: num_rows_returned_
> rows_returned_? (or rename that member, which might be better).
Updated the comment. I would rename the member in other circumstances but I don't want to
make too make minor API changes since it will increase the amount of code churn and noise
in the patch that switches the ExecNodes over to this.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message