impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Date Mon, 22 May 2017 19:36:04 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1031:     DCHECK_EQ(0, scratch_batch_->total_allocated_bytes());
Where do the decompression buffers get freed?


http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS2, Line 48: MemPool
It's not clear from the var names comments where the var-len data goes. That's important to
point out explicitly.


Line 50:   // Pool used to accumulate other memory such as decompression buffers that may
be
may be referenced


Line 109:     dst_batch->tuple_data_pool()->AcquireData(&aux_mem_pool, false);
I would have thought that the var-len data like strings or collections can make up the bulk
of memory that needs to be transferred, so why not deep-copy those out as well and avoid this
transfer? What's the rationale behind only avoiding transferring the mem for the fixed-len
portion?


Line 130:     if (num_output_batches > 1) return false;
This new compaction has non-obvious caveats like this one, and I find the flow of memory difficult
to follow now. I wonder if this process could be simplified if we did something along these
lines:
1. Evaluate conjuncts over all tuples in scratch batch. Keep a bitmap of survivors.
2. Decide whether to compact scratch batch or not.
3. Transfer rows to output batch. When AtEnd() of the scratch batch, have a function TransferResources()
or similar to transfer whatever the output batch needs. This may be the original memory or
memory from compaction.

Let's discuss before you make any changes obviously :)


Line 139:     for (int i = dst_batch->num_rows(); i < end_row; ++i) {
Don't we have a CopyRows() for this in RowBatch?


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message