impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] PREVIEW: IMPALA-4678: port backend exec to use buffer pool
Date Fri, 31 Mar 2017 18:23:04 GMT
Tim Armstrong has uploaded a new patch set (#7).

Change subject: PREVIEW: IMPALA-4678: port backend exec to use buffer pool
......................................................................

PREVIEW: IMPALA-4678: port backend exec to use buffer pool

Always create global BufferPool at startup using 80% of memory and
limit reservations to  80% of query memory (same as BufferedBlockMgr).

Each ExecNode has to declare its memory requirements at Prepare() time.

Convert HashTable to use the new BufferPool via a Suballocator.

Make PAGG memory consumption more efficient (avoid wasting buffers):
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.

Convert Sorter to use BufferPool.

TODO in this patch:
 * some of the DCHECKS may be too aggressive. With the current memory
   transfer model, operators that accumulate batches, i.e. NLJ, can
   "steal" reservation. We need a test to reproduce this problem. We
   can probably fix by having NLJ copy if it sees an attached buffer.
 * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Remove the old hash join and aggregation nodes

Testing:
* Updated tests to reflect new memory requirements
* TODO: recalibrate limits in test_mem_usage_scaling
* TODO: more tests to exercise new code paths

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/static-asserts.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_sort.py
R tests/query_test/test_spilling.py
M tests/stress/concurrent_select.py
60 files changed, 1,541 insertions(+), 7,538 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/5801/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message