impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-2736: Basic column-wise slot materialization in Parquet scanner.
Date Thu, 21 Apr 2016 02:08:56 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet scanner.

Patch Set 3:

File be/src/exec/

Line 195:   RowBatch batch;
> The embedded RowBatch is a little funky but this seems like the best soluti

Line 207:   //~ScratchTupleBatch() { pool->FreeAll(); }
> ?

Line 1729: int HdfsParquetScanner::TransferScratchTuples(ScratchTupleBatch* scratch_batch)
> Consider adding __restrict__ annotation. I suspect all of the member variab
Thanks for the suggestions, Tim. I rewrote this function to be more performant (but possibly
less readable). I ran a simple experiment to see the difference between the "basic" loop in
this patch set and the optimized one in the new patch set:

(I hacked the scan node to not output any rows)

select l_linenumber from huge_lineitem

Before: 1.5s

After 1.3s

This is averaged over 10 runs on release with a single impalad and scanner thread, after running
a dummy query to load the table metadata.

Looks like a decent perf improvement, but it should be weighed against the loss in readability
(which is not super bad, imo).

I don't want to linger too long on micro-optimizations just yet, since there are so many macro-optimizations
left that will give us more bang for the buck (in a dev time spent vs perf benefit sense)

Line 1738:     output_row->SetTuple(scan_node_->tuple_idx(), scratch_tuple);
> If this loop is performance-sensitive, consider hoisting this pointer indir
Rewrote this function. See comment above.

Line 1742:     if (!scanner_conjunct_ctxs_->empty() && !ExecNode::EvalConjuncts(
> Not your change, but we should consider just copying the scanner_conjunct_c
I made some local changes here to follow your suggestions, but did not copy the conjuncts_ctxs
into the scanner.

Line 1823:     int num_row_to_commit = TransferScratchTuples(scratch_batch);
> There's an assumption here that the remaining scratch rows will fit in the 
File be/src/util/rle-encoding.h:

Line 249:   // significantly better than UNLIKELY(literal_count_ == 0 && repeat_count_
== 0)
> Hopefully as this progresses we might be able to use a batched interface th
Correct. I'm already working on batch-reading and caching the def/rep levels.

Line 250:   if (repeat_count_ == 0) {
> Would (repeat_count_ + literal_count_) == 0 be better still? Not sure if th
Might be better, no idea. Will try it.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Skye Wanderman-Milne <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message