impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) PREVIEW: Basic column-wise slot materialization in Parquet scanner.
Date Wed, 13 Apr 2016 17:05:46 GMT
Tim Armstrong has posted comments on this change.

Change subject: PREVIEW: Basic column-wise slot materialization in Parquet scanner.
......................................................................


Patch Set 1:

(3 comments)

Is the long-term plan to keep the scratch batch in the row-wise format? It seems like this
should work ok cache-wise (batch should fit in cache, memory access pattern will have gaps
but a regular stride), but having the values densely packed would allow some optimisations
down the road. I suspect it would be slightly faster in the short term but I don't know if
it would have a long term impact.

http://gerrit.cloudera.org:8080/#/c/2779/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1732:   // and return an output batch with relatively few rows.
The TODO describes the current intended behaviour, so that sounds right. I think sending small
batches up the tree is ok for selective scans.


Line 1737:       // Optimization for scans with selective filters/conjuncts: None of the
Is this factoring in accumulated disk buffers?


Line 1829: ReadValueBatch
Ignoring return value?

I think we need to be careful about propagating errors, since I think it could end badly if
there's a read error and we try to evaluate conjuncts or filters over bogus data.

The existing code avoids this by checking for errors every row.


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message