impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riza Suminto (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.
Date Thu, 02 Apr 2020 14:45:16 GMT
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15511

to look at the new patch set (#10).

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
......................................................................

WIP IMPALA-9434: Implement Robin Hood Hash Table.

Robin hood hashing reduces the variances of probe lengths by
continually rebalancing elements. This patch is the first step towards
full robin hood hash table implementation by doing bucket rebalancing
after every insert.

If a hash table is configured as a robin hood hash table, the new
element insertion will be buffered in a temporary bucket. This
temporary bucket will then matched against existing bucket elements,
swapped with a "rich" bucket, and continue doing so until it swap with
an empty bucket. The PSL (probe sequence length) invariant is
maintained in robin hood hash table. This allow us to add
short-circuit in the Probe function to immediately returns when it
finds out that the PSL of currently visited bucket is smaller/richer
than the key that is being looked up, indicating that the key does not
exist in the table. Instead of continue probing until next empty
bucket is found, Probe can immediately the return the index of
recently visited richer bucket to caller along with the not found
flag. In turn, the caller use this returned index to specify in which
index the new element should be inserted to.

Testing:
- Add backend test for Robin Hood hash table in hash-table-test.cc
- Pass core tests

Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/hash-table-benchmark.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exprs/scalar-expr.h
10 files changed, 777 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/15511/10
-- 
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto <riza.suminto@cloudera.com>
Gerrit-Reviewer: David Rorke <drorke@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.suminto@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message