impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Date Thu, 10 Aug 2017 18:54:26 GMT
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7646

Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue
......................................................................

IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
                             cast(l_comment as char(120)),
                             cast(l_comment as char(120)),
                             cast(l_comment as char(120)),
                             cast(l_comment as char(120)),
                             cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
6 files changed, 31 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <henry@cloudera.com>

Mime
View raw message