impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Date Fri, 29 Sep 2017 01:41:25 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

Patch Set 6:

(1 comment)
File be/src/exec/
PS6, Line 1071:       if (PageContainsTupleData(current_page_header_.data_page_header.encoding))
> Playing devil's advocate and really trying to understand the crux of why th
The basic problem is that I want to be able to reason about how many I/O buffers you need
to make progress in reading through a Parquet file. If each column can make progress independently
then it's simple since you can reason about each column in isolation. With the copying out
you only need one buffer per column to make progress.

If you try to hold onto I/O buffers you end up with dependencies between column readers. E.g.
reader A may have materialised 512 values and hit the end of a buffer. Reader A can't free
the previous buffer until those rows have been returned. Those rows can't be returned until
all of the column values have been materialised for those rows. But maybe column B can only
materialise 511 values before hitting the end of its buffer. So then you'd need to output
511 rows, then advance column B, then output 1 row before you advance column A.

You can get out of that two ways:
* Advance the different column readers different amounts and make progress on materialising
as many final rows as you can. I think you would need to try to advance all column readers,
then return however many rows you can at that point, then try to inch them forward again.
* Let each column reader advance to the next buffer before freeing the previous one and hope
that all column readers can materialise the requested number of values without running out
of buffers. We'd need to make some assumptions to ensure that we can always materialise batch_size
values given M I/O buffers. I don't think we can make any universally valid assumptions so
we'd probably have to add code to detect when those assumptions are invalidated and add query
options to get out of any problems.

It doesn't seem worth spending complexity on either solution, especially since we'd need extra
test coverage.

Data pages are much smaller (around 64kb) than the 8MB I/O buffers and aren't currently part
of the reservation so they're less of a problem for now - the memory overhead per column will
be around max(2 * 64kb, 1024 * <avg value size>) which is tolerable.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Fri, 29 Sep 2017 01:41:25 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message