impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Date Tue, 08 Nov 2016 23:12:59 GMT
Hello Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4655

to look at the new patch set (#15).

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
......................................................................

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/common/test_result_verifier.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_udfs.py
47 files changed, 1,039 insertions(+), 802 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message