impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Date Wed, 12 Jul 2017 00:17:59 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
......................................................................


Patch Set 1:

(15 comments)

nice!

http://gerrit.cloudera.org:8080/#/c/7400/1//COMMIT_MSG
Commit Message:

PS1, Line 9: tuple
in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage.


PS1, Line 11: With this commit TopN node
commit, the TopN node


PS1, Line 12: its
stored in the priority queue


PS1, Line 12: This is done every time the
            : number of old rows (removed from the priority queue it uses) is more
            : than twice the row limit (N in TopN)
This is done when the number of rows removed from the priority queue is more than twice the
N (limit + offset).


PS1, Line 24: general
expected general case


PS1, Line 27: data
example worst case: data stored...


PS1, Line 39: performance
query runtimes


PS1, Line 39: More extensive perf testing will be
            : done in the future to recognize pathological cases
Not necessarily pathological cases, more that we're gonna do cluster testing later. Just say:
Cluster perf testing will be done later.


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

PS1, Line 28: limit_
why isn't this (limit_ + offset_), since that's the size of the PQ?


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS1, Line 88: NumOfTimesTuplePoolReclaimed
TuplePoolReclamations is more concise


PS1, Line 88:   
indentation


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS1, Line 71: Create new tuple pool with only the elements inside the priority queue and release
            :   // memory from the old tuple pool
Re-materialize the elements in the priority queue into a new tuple pool, and release the previous
pool.


PS1, Line 115: poped
popped has two 'p's. Regardless, let's call this rows_to_reclaim_ because num popped sounds
like it wouldn't be reset, i.e. just a running counter.


http://gerrit.cloudera.org:8080/#/c/7400/1/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

Line 67:     return 'tpch'
newline after this


PS1, Line 75:  \
nit: remove space to be consistent


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message