impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5844: use a MemPool for expr local allocations
Date Fri, 22 Sep 2017 17:24:36 GMT
Hello Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#7).

Change subject: IMPALA-5844: use a MemPool for expr local allocations
......................................................................

IMPALA-5844: use a MemPool for expr local allocations

Local allocations in expressions have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 + .... bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M tests/custom_cluster/test_alloc_fail.py
72 files changed, 621 insertions(+), 741 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/8025/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message