impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Date Tue, 23 May 2017 00:39:56 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 53:   MemPool aux_mem_pool;
> Another thought: How about splitting this pool up into memory that is only 
I started to implement this but realised there were some serious caveats to deep-copying var-len
data that mean I'd rather not bundle it into this patch. It may have some "interesting" effects
on performance and memory consumption. E.g. if a string column is fully or partially dictionary-encoded
it could blow up the memory footprint by duplicating the string values.

The change in this patch is more of an unambiguous win.

If we don't do the compaction, it seems a bit silly to group memory that is essentially managed
in the same way into different MemPools. I don't have a strong objection to it but my intuition
is that it's better not to do it. I extended the comment a bit to give better guidance about
the appropriate usage patterns and when memory can be added to the pools, which may help make
it more comprehensible.


Line 100:   void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) {
> FinalizeTupleTransfer() is a little clearer because we typically mean resou
Done


Line 102:     ++num_output_batches;
> Wondering if we should make this "non-empty output batches".
Done. Yeah that's a more precise thing to do.


Line 129:     // Compaction is unsafe if the scratch batch was split across multiple output
batches
> Mind moving these two checks into the caller? Checking 'num_output_batches'
I moved the first check into the caller, but the second check is actually unnecessary because
the code below handles that case correctly. I removed it and commented to that effect.


Line 160:   int64_t total_allocated_bytes() {
> const
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message