impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Date Tue, 30 May 2017 18:41:49 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2

Patch Set 3:

File be/src/runtime/

PS5, Line 118: (a tricky edge case).
> how so?

Line 340:   }
> this case occurs because we eagerly create write pages. Did you consider ma
The bigger problem is that we'd then be deferring creation of the page until after PrepareForWrite()
or PrepareForReadWrite(), which would break all the client code assumes that PrepareFor*()
grabs the memory required it iterate over an unpinned stram.

We could do something like have a ReservationTracker embedded in the stream where we save
the reservation until needed, but it didn't seem like a net simplification to me.

Line 378:       && !buffer_pool_client_->IncreaseReservationToFit(read_page_->len()))
> this comment seems misplaced now, seems better attached to 390.

PS5, Line 380: an the default page length, then unpinning the previous page may
> I don't understand why we do that but then the comment below says "The clie
Yeah that's true. It would make sense if appending a larger row was "best effort" instead
of meant to be guaranteed to succeed (see my other response in the .h).

PS5, Line 421: Status::OK();
> why is this needed? won't we have at least one page? oh, I guess in the cas
Yeah exactly.
File be/src/runtime/buffered-tuple-stream-v2.h:

PS3, Line 212: max_page_len
> Yeah, I'm more just asking for the motivation. I guess it's a guard rail to
Oh, you mean why not just make a best-effort attempt to allocate large pages in the stream
and depend on the client to ensure enough reservation is available for the maximum page size?
That's easy enough to implement.

My original thinking was that this would be more deterministic - if we encounter a too-large
row then it will deterministically fail. Without a max we might succeed sometimes and fail
other times, depending on how much additional reservation the client grabs at runtime. 

That may not matter too much since success/failure is not totally deterministic system-wide
regardless since the large rows could be filtered out by runtime filters or limits. Whether
we append probe side rows to a stream is also non-deterministic because it depends on if and
when the join spills.

We could change this so that appending a larger row is a "best-effort" thing rather than guaranteed.
The main downside seems to be what you mentioned - there is no guard rail to prevent one stream
from using unlimited memory. I'd expect that to be handled correctly - at some point reading
or writing a stream should fail with an error status if another stream used up too much memory
- but it does increase the state space more.

PS3, Line 254: stream could not increase the reservation
             :   ///   sufficiently
> By "above" i meant lines 87-88: 
Ah ok. Yeah I think this could be made consistent once we decide whether it's best-effort
(depends on whether the callee could get the reservation, but caller can ensure success if
that is true) or requires (callee will fail on unpinned streams if the reservation is not
already available).

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message