impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Date Wed, 10 May 2017 21:35:40 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6638/3//COMMIT_MSG
Commit Message:

PS3, Line 16: buffers
> buffer bytes? bytes of reservation?
Done


PS3, Line 16: 14
> why is that not 15 (15 pages for writing to the partitions that aren't the 
Done


http://gerrit.cloudera.org:8080/#/c/6638/3/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS3, Line 95: other
> aren't there n-1 other streams?
Done


PS3, Line 212: max_page_len
> how will that be determined?
This is just adding the mechanism, so policy is TBD. We'd discussed adding a query option
to specify the maximum row size.


PS3, Line 254: stream could not increase the reservation
             :   ///   sufficiently
> does it try this for large rows, or not? above it says the caller must have
I'm not sure where the "above" is. 

This is true regardless for the pinned case, because there's no guarantee or requirement that
we can extend a stream with a pinned page.

I tried to clarify further on line 262 the behaviour with a large row and an unpinned stream.


PS3, Line 280: const WriteRowFn& write_fn
> why is this part exposed outside BTS2? i thought this was just to reuse cod
AllocateRow() is still used by PAGG. The motivation for this choice is partially to avoid
duplication, but also the AllocateRow() interface needed to change regardless because we need
to advance to the next page immediately after the row is written. The old interface assumes
that nothing needs to be done after the caller writes the row into the stream.

The alternative is to have the code duplication and expose a pair of methods like AllocateRow()/AllocateRowDone(),
with PAGG responsible for calling AllocateRowDone() after it writes the data.


Line 466:   /// hard limit on the maximum size of row that can be stored in the stream.
> motivation for this limit?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message