impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikramjeet Vig (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Date Thu, 13 Jul 2017 23:48:16 GMT
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
......................................................................

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

                       With Reclaim           Without Reclaim
                   Query 1     Query 2      Query 1     Query 2
MaxTuplePoolMem    3.96 KB     3.43 KB      110.2 MB    708.8 MB
Time (mean)        2s 218ms    6s 391ms     2s 021ms    6s 406ms
Time (stdev)       74.38ms     67.45ms      102.71ms    70.44ms
Reclaims            910         5861          N/A         N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 124 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message