impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3344: Simplify sorter and document/enforce invariants.
Date Wed, 04 May 2016 23:26:54 GMT
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................

IMPALA-3344: Simplify sorter and document/enforce invariants.

Clarify relationships between classes, clean up the previous mess
where every class was friends with the other so there's an actual
distinction between public and private members. TupleIterator
is now no longer tied to TupleSorter, just Run.

Factor out some functions from large functions.

Document and enforce invariants in many cases.

Simplify and document iterator logic.

Make management of buffers when iterating over output stream more
explicitly correct: either use MarkNeedToReturn() or attach block
to the batch as appropriate. The SortedRunMerger didn't handle
resource transfer correctly, except if all the memory came from
the batch's MemPool. This patch fixes the cases when resources
are attached to the batches, but not the 'need_to_return' case.
Document that SortedRunMerger requires 'deep_copy_input' to be true
if batches can have the 'need_to_return' flag set.

Also use the atomic block exchange operation when moving between
blocks in unpinned runs to prevent pin failures at that point.
I explicitly have avoided changing the hairy block management logic
when allocating buffers for merging, that will need addressing in
a follow-up patch.

Add a SpilledRuns counter so that it's more explicit that spilling
occurred.

Testing:
Added some tests for corner cases with empty and NULL strings.

Performance:
Benchmarking against old code initial revealed some regressions from
changes in inlining. Force inlining the TupleComparator::operator() and
iterator Next()/Prev() functions helped and performance seems similar or
slightly better on the targeted orderby benchmarks.

Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
---
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/tuple-row-compare.h
M testdata/workloads/functional-query/queries/QueryTest/single-node-large-sorts.test
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M tests/query_test/test_sort.py
11 files changed, 875 insertions(+), 617 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/26/2826/9
-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message