impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Date Wed, 27 Sep 2017 16:09:00 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

Patch Set 6:

(1 comment)
Commit Message:
PS6, Line 44: There is a significant regression (50% increase in runtime) in
> Just to clarify my point. My hope is that most of the overhead may ultimate
I'm pretty sure it's the memory copying - making a memory allocation should at least an order
of magnitude cheaper than doing a pass over a data page. Unsure if the difference is due to
the extra instructions executed or the increase in cache pressure from having two copies of
the data.

I haven't measured but I'm also not convinced that the Disk IO Mgr's buffer caching is necessarily
more efficient or scalable than TCMalloc. The IO mgr just has a global lock whereas TCMalloc
has locks per size class plus batching via the thread cache. The buffer pool should be more
scalable for large allocations than either in any case.

The queries that regressed are close to the worst possible case since they don't do any work
aside from materialising the strings and evaluating a conjunct. Plus the data is already present
in the buffer cache.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Wed, 27 Sep 2017 16:09:00 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message