impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3354: bad sorter pivot selection on some inputs
Date Wed, 20 Apr 2016 21:40:36 GMT
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3354: bad sorter pivot selection on some inputs
......................................................................

IMPALA-3354: bad sorter pivot selection on some inputs

Switch to a median of three random tuples that should be very robust to
a range of inputs. It may be slightly worse than the existing pivot
selection on some inputs where the original algorithm is close to
optimal (e.g. already sorted inputs), but should be typically
better overall.

Always always recurse on the smaller partition: this prevent the stack
overflow even with bad pivot selection.

The overhead is minimal - in profiles for small sorts I'm seeing pivot
selection take at most 0.5% of CPU time.

The improved pivot selections gives modest improvements of 2-5% on the
targeted perf order by benchmarks on a single node run with TPC-H
scale factor 20.

Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 109 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/24/2824/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message