impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikramjeet Vig (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Date Tue, 11 Jul 2017 22:27:43 GMT
Bikramjeet Vig has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7400

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
......................................................................

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in tuple pool which
results in excessive memory usage.
With this commit TopN node will periodically re-materialise the rows
its using and reclaim the old allocations. This is done every time the
number of old rows (removed from the priority queue it uses) is more
than twice the row limit (N in TopN). Moreover, a new counter called
"NumOfTimesTuplePoolReclaimed" is added to the TopN node which does
exactly what the name suggests.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (data sorted in reverse before feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

                       With Reclaim           Without Reclaim
                   Query 1     Query 2      Query 1     Query 2
MaxTuplePoolMem    3.96 KB     3.43 KB      110.2 MB    708.8 MB
Time (mean)        2s 218ms    6s 391ms     2s 021ms    6s 406ms
Time (stdev)       74.38ms     67.45ms      102.71ms    70.44ms
Reclaims            910         5861          N/A         N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar performance. More extensive perf testing will be
done in the future to recognize pathological cases.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 77 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>

Mime
View raw message