impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Date Tue, 23 May 2017 00:48:19 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations

Patch Set 4:

File be/src/exec/

Line 979: Status HdfsParquetScanner::ResetScratchBatch() {
> Why not move this into ScratchTupleBatch, i.e. pass in the template tuple t
ScratchTupleBatch then would have to call out to HdfsScanNode::InitTuple(). I can do a larger
restructure, e.g. moving InitTuple() into Tuple or similar if you think that will make things
clearer. I think it's probably an improvement - just checking that you think that makes sense
before doing it.

Line 983:   if (template_tuple_ == nullptr && tuple_byte_size_ <= CACHE_LINE_SIZE)
> Not sure I completely understand the CACHE_LINE_SIZE check. We are zeroing 
Augmented the comment.

There's some cut-over where the old code is faster. E.g. if the tuple has 1000 slots, it's
probably better to zero out 125 bytes of null indicators row-by-row instead of zeroing out
all the 1024 multi-kb rows.

I think this optimisation doesn't matter too much for tuples with more than a handful of slots,
since the cost of materialization is high compared to the cost of zeroing things.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: anujphadke <>
Gerrit-HasComments: Yes

View raw message