impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Date Fri, 29 Sep 2017 00:35:38 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071
PS6, Line 1071:       if (PageContainsTupleData(current_page_header_.data_page_header.encoding))
{
> (if we pushed predicate evaluation down into the column readers, that might
Playing devil's advocate and really trying to understand the crux of why this new approach
is simpler.

You are right that the stream can free the previous I/O buffer immediately, but you are still
accumulating the same amount of memory in the data_pool_, so the peak memory consumption seems
no different than before.

Overall you might even need one more page than before. We copy and accumulate data pages here,
but only free the data page from the stream in the next call to ReadDataPage(), so for some
period of time one page exists both in the stream and in the data_pool_.

I'm not sure I follow why the the "synchronized release of I/O buffers" is difficult in the
copy-in-transfer approach. The scratch batch already has all interesting I/O buffers attached,
so we can release those after copying the var-len data of survivors. Returned batches will
have no I/O buffers attached.

Completely agree about the perf tradeoffs you mentioned.



-- 
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 00:35:38 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message