impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Date Tue, 09 May 2017 00:52:05 GMT
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
......................................................................

IMPALA-5085: large rows in BufferedTupleStreamV2

The stream defaults to pages of default_page_len_. If a row doesn't
fit in that page, it will allocate another page up to max_page_len_
bytes and append a single row to that page, then immediately advance
to the next page. This means that when writing a stream, the large
page only needs to be kept in memory temporarily, which helps with
memory requirements.  E.g. consider a hash join that is repartitioning
1 unpinned stream into 16 unpinned streams. We will need
default_page_len_ * 14 + max_page_len_ * 2 buffers because when
processing a large row we only need one large write buffer at a time.

The large row cases are not as optimised for memory consumption or
performance - queries processing very large numbers of large rows
are an extreme edge case that is likely to hit other performance
bottlenecks first. Pages with large rows can have up to 50%
internal fragmentation. I tried to avoid adding empty pages to the
stream, but it was tricky to avoid in the case of a read/write
stream where the read iterator was already pointing at the write
page - in that case we can end up with an empty page.

To avoid duplicating more logic between AddRow() and AllocateRow()
I restructured things so that AddRowSlow() is implemented in terms
of AllocateRowSlow(). AllocateRow() now takes a function as an
argument to populate the row.

Testing:
* Added tests for the case where 0 rows are added to the stream
* Extend BigRow to exercise the new code.
* Also test large strings and read/write streams.

Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
---
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/buffered-tuple-stream-v2.inline.h
M common/thrift/generate_error_codes.py
5 files changed, 537 insertions(+), 231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message