impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Date Wed, 24 May 2017 16:50:59 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6950/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 991:       InitTuple(template_tuple_, scratch_batch_->GetTuple(i));
> does the extra branch in InitTuple not matter because it's always perfectly
Good point. I think the bottleneck there is probably more the branching inside memcpy(), since
byte_size isn't a compile-time constant, but we might as well make this more efficient.

I moved the logic to an InitTupleBuffer() function in HdfsScanner and use a more optimised
implementation for all three cases.

I did some basic profiling of scans of a partitioned table with perf top and it seems like
most of the time is spent inside memcpy() both before and after the change.


-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: anujphadke <aphadke@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message