impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Date Mon, 22 May 2017 23:34:24 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 130:     // because the batch we returned earlier may hold a reference into 'tuple_mem'.
> I don't think this is a significant problem for performance - if the scan i
Thanks for explaining. Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 53:   MemPool aux_mem_pool;
Another thought: How about splitting this pool up into memory that is only referenced by tuples
in tuple_mem and memory that could be refd by already-returned batches.

For example, the decompression buffers live across multiple scratch/transfer rounds, but as
far as I can tell all var-len memory that is allocated from this pool is only referenced by
tuples in tuple_mem. I'd even be ok with naming the pools 'var_len_pool' and 'decompression_pool'.
That would make it less mysterious what is inside them.

This change is mostly for understandability, but it would also allow us to avoid unnecessarily
transferring var-len data (I know it's only a nice-to-have from a perf perspective).

People reading this code may have the same question as me regarding the var-len data.


Line 100:   void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) {
FinalizeTupleTransfer() is a little clearer because we typically mean resources in the context
of "transfer"


Line 102:     ++num_output_batches;
Wondering if we should make this "non-empty output batches".


Line 129:     // Compaction is unsafe if the scratch batch was split across multiple output
batches
Mind moving these two checks into the caller? Checking 'num_output_batches' in the same fn
where it is updated makes it more obvious what it is used for.

This will also clarify in which situations TryCompact() can "fail", i.e., only when there
is not enough mem


Line 160:   int64_t total_allocated_bytes() {
const


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message