impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tarmstr...@apache.org
Subject [2/3] incubator-impala git commit: IMPALA-4302, IMPALA-2379: constant expr arg fixes
Date Tue, 08 Nov 2016 04:10:09 GMT
IMPALA-4302,IMPALA-2379: constant expr arg fixes

This patch fixes two issues around handling of constant expr args.
The patches are combined because they touch some of the same code
and depend on some of the same memory management cleanup.

First, it fixes IMPALA-2379, where constant expr args were not visible
to UDAFs. The issue is that the input exprs need to be opened before
calling the UDAF Init() function.

Second, it avoids overhead from repeated evaluation of constant
arguments for ScalarFnCall expressions on both the codegen'd and
interpreted paths. A common example is an IN predicate with a
long list of constant values.

The interpreted path was inefficient because it always evaluated all
children expressions. Instead in this patch constant args are
evaluated once and cached. The memory management of the AnyVal*
objects was somewhat nebulous - adjusted it so that they're allocated
from ExprContext::mem_pool_, which has the correct lifetime.

The codegen'd path was inefficient only with varargs - with fixed
arguments the LLVM optimiser is able to infer after inlining that
the expressions are constant and remove all evaluation. However,
for varargs it stores the vararg values into a heap-allocated buffer.
The LLVM optimiser is unable to remove these stores because they
have a side-effect that is visible to code outside the function.

The codegen'd path is improved by evaluating varargs into an automatic
buffer that can be optimised out. We also make a small related change
to bake the string constants into the codegen'd code.

Testing:
Ran exhaustive build.

Added regression test for IMPALA-2379 and MemPool test for aligned
allocation. Added a test for in predicates with constant strings.

Perf:
Added a targeted query that demonstrates the improvement. Also manually
validated the non-codegend perf. Also ran TPC-H and targeted perf
queries locally - didn't see any significant changes.

+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload           | Query                         | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TARGETED-PERF(_20) | primitive_filter_in_predicate | parquet / none / none | 1.19   | 9.82        | I -87.85%  |   3.82%   |   0.71%        | 1           | 10    |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

(I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max      | Base Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| 01:AGGREGATE | 14.39%     | 155.88ms | 214.61ms | -27.37%    |   2.68%   | 163.38ms | 227.53ms | -28.19%    | 1      | 1      | 1         |
| 00:SCAN HDFS | 85.60%     | 927.46ms | 9.43s    | -90.16%    |   4.49%   | 1.01s    | 9.50s    | -89.42%    | 1      | 13.77K | 14.05K    |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+

Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Reviewed-on: http://gerrit.cloudera.org:8080/4838
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/10fa472f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/10fa472f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/10fa472f

Branch: refs/heads/master
Commit: 10fa472fa6aa036be02748ae54daed1722449c68
Parents: e3483c4
Author: Tim Armstrong <tarmstrong@cloudera.com>
Authored: Wed Oct 26 10:55:23 2016 -0700
Committer: Internal Jenkins <cloudera-hudson@gerrit.cloudera.org>
Committed: Tue Nov 8 02:44:51 2016 +0000

----------------------------------------------------------------------
 be/src/benchmarks/in-predicate-benchmark.cc     |   5 +-
 be/src/codegen/gen_ir_descriptions.py           |   4 +-
 be/src/codegen/llvm-codegen.cc                  |  31 ++-
 be/src/codegen/llvm-codegen.h                   |  14 +-
 be/src/exec/partitioned-aggregation-node.cc     |  28 +--
 be/src/exprs/agg-fn-evaluator.cc                |  29 ++-
 be/src/exprs/aggregate-functions-ir.cc          |   6 +-
 be/src/exprs/anyval-util.cc                     |  38 ++--
 be/src/exprs/anyval-util.h                      |  22 +-
 be/src/exprs/case-expr.cc                       |  15 +-
 be/src/exprs/expr-context.h                     |  11 +-
 be/src/exprs/expr.cc                            |  89 ++++----
 be/src/exprs/expr.h                             |  20 +-
 be/src/exprs/hive-udf-call.cc                   |  18 +-
 be/src/exprs/in-predicate.h                     |  35 ++-
 be/src/exprs/literal.cc                         |   5 +-
 be/src/exprs/scalar-fn-call.cc                  | 222 ++++++++++---------
 be/src/exprs/scalar-fn-call.h                   |  24 +-
 be/src/exprs/utility-functions-ir.cc            |   1 +
 be/src/runtime/free-pool-test.cc                |  38 ++--
 be/src/runtime/free-pool.h                      |   7 +-
 be/src/runtime/mem-pool-test.cc                 |  37 +++-
 be/src/runtime/mem-pool.cc                      |   6 +-
 be/src/runtime/mem-pool.h                       |  61 +++--
 be/src/testutil/test-udas.cc                    |  13 ++
 be/src/udf/udf-internal.h                       |  63 ++++--
 be/src/udf/udf-test-harness.cc                  |   2 +-
 be/src/udf/udf.cc                               |  26 ++-
 be/src/udf/udf.h                                |  15 +-
 be/src/util/bit-util.h                          |   2 +-
 .../queries/QueryTest/exprs.test                |  29 +++
 .../functional-query/queries/QueryTest/uda.test |   7 +
 .../queries/primitive_filter_in_predicate.test  |  10 +
 tests/query_test/test_udfs.py                   |   5 +
 34 files changed, 578 insertions(+), 360 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/benchmarks/in-predicate-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/in-predicate-benchmark.cc b/be/src/benchmarks/in-predicate-benchmark.cc
index 1a95f7a..6547b93 100644
--- a/be/src/benchmarks/in-predicate-benchmark.cc
+++ b/be/src/benchmarks/in-predicate-benchmark.cc
@@ -173,6 +173,7 @@
 using namespace impala;
 using namespace impala_udf;
 using namespace strings;
+using std::move;
 
 namespace impala {
 
@@ -223,8 +224,8 @@ class InPredicateBenchmark {
 
     vector<AnyVal*> constant_args;
     constant_args.push_back(NULL);
-    for (AnyVal* p: data.anyval_ptrs) constant_args.push_back(p);
-    UdfTestHarness::SetConstantArgs(ctx, constant_args);
+    for (AnyVal* p : data.anyval_ptrs) constant_args.push_back(p);
+    UdfTestHarness::SetConstantArgs(ctx, move(constant_args));
 
     InPredicate::SetLookupPrepare<T, SetType>(ctx, FunctionContext::FRAGMENT_LOCAL);
     data.state = *reinterpret_cast<InPredicate::SetLookupState<SetType>*>(

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/codegen/gen_ir_descriptions.py
----------------------------------------------------------------------
diff --git a/be/src/codegen/gen_ir_descriptions.py b/be/src/codegen/gen_ir_descriptions.py
index 2b08509..ddcd1db 100755
--- a/be/src/codegen/gen_ir_descriptions.py
+++ b/be/src/codegen/gen_ir_descriptions.py
@@ -200,9 +200,9 @@ ir_functions = [
   ["TOPN_NODE_INSERT_BATCH",
    "_ZN6impala8TopNNode11InsertBatchEPNS_8RowBatchE"],
   ["MEMPOOL_ALLOCATE",
-   "_ZN6impala7MemPool8AllocateILb0EEEPhl"],
+   "_ZN6impala7MemPool8AllocateILb0EEEPhli"],
   ["MEMPOOL_CHECKED_ALLOCATE",
-   "_ZN6impala7MemPool8AllocateILb1EEEPhl"]
+   "_ZN6impala7MemPool8AllocateILb1EEEPhli"]
 ]
 
 enums_preamble = '\

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/codegen/llvm-codegen.cc
----------------------------------------------------------------------
diff --git a/be/src/codegen/llvm-codegen.cc b/be/src/codegen/llvm-codegen.cc
index f6c2e89..f629d35 100644
--- a/be/src/codegen/llvm-codegen.cc
+++ b/be/src/codegen/llvm-codegen.cc
@@ -588,9 +588,19 @@ Value* LlvmCodeGen::GetIntConstant(int num_bytes, uint64_t low_bits, uint64_t hi
   return ConstantInt::get(context(), APInt(8 * num_bytes, vals));
 }
 
+Value* LlvmCodeGen::GetStringConstant(LlvmBuilder* builder, char* data, int len) {
+  // Create a global string with private linkage.
+  Constant* const_string =
+      ConstantDataArray::getString(context(), StringRef(data, len), false);
+  GlobalVariable* gv = new GlobalVariable(
+      *module_, const_string->getType(), true, GlobalValue::PrivateLinkage, const_string);
+  // Get a pointer to the first element of the string.
+  return builder->CreateConstInBoundsGEP2_32(NULL, gv, 0, 0, "");
+}
+
 AllocaInst* LlvmCodeGen::CreateEntryBlockAlloca(Function* f, const NamedVariable& var) {
   IRBuilder<> tmp(&f->getEntryBlock(), f->getEntryBlock().begin());
-  AllocaInst* alloca = tmp.CreateAlloca(var.type, 0, var.name.c_str());
+  AllocaInst* alloca = tmp.CreateAlloca(var.type, NULL, var.name.c_str());
   if (var.type == GetType(CodegenAnyVal::LLVM_DECIMALVAL_NAME)) {
     // Generated functions may manipulate DecimalVal arguments via SIMD instructions such
     // as 'movaps' that require 16-byte memory alignment. LLVM uses 8-byte alignment by
@@ -600,10 +610,20 @@ AllocaInst* LlvmCodeGen::CreateEntryBlockAlloca(Function* f, const NamedVariable
   return alloca;
 }
 
+AllocaInst* LlvmCodeGen::CreateEntryBlockAlloca(
+    const LlvmBuilder& builder, Type* type, const char* name) {
+  return CreateEntryBlockAlloca(
+      builder.GetInsertBlock()->getParent(), NamedVariable(name, type));
+}
+
 AllocaInst* LlvmCodeGen::CreateEntryBlockAlloca(const LlvmBuilder& builder, Type* type,
-                                                const char* name) {
-  return CreateEntryBlockAlloca(builder.GetInsertBlock()->getParent(),
-                                NamedVariable(name, type));
+    int num_entries, int alignment, const char* name) {
+  Function* fn = builder.GetInsertBlock()->getParent();
+  IRBuilder<> tmp(&fn->getEntryBlock(), fn->getEntryBlock().begin());
+  AllocaInst* alloca =
+      tmp.CreateAlloca(type, GetIntConstant(TYPE_INT, num_entries), name);
+  alloca->setAlignment(alignment);
+  return alloca;
 }
 
 void LlvmCodeGen::CreateIfElseBlocks(Function* fn, const string& if_name,
@@ -1237,7 +1257,8 @@ Value* LlvmCodeGen::CodegenAllocate(LlvmBuilder* builder, MemPool* pool, Value*
   Function* allocate_fn = GetFunction(IRFunction::MEMPOOL_ALLOCATE, false);
   PointerType* pool_type = GetPtrType(MemPool::LLVM_CLASS_NAME);
   Value* pool_val = CastPtrToLlvmPtr(pool_type, pool);
-  Value* fn_args[] = { pool_val, size };
+  Value* alignment = GetIntConstant(TYPE_INT, MemPool::DEFAULT_ALIGNMENT);
+  Value* fn_args[] = {pool_val, size, alignment};
   return builder->CreateCall(allocate_fn, fn_args, name);
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/codegen/llvm-codegen.h
----------------------------------------------------------------------
diff --git a/be/src/codegen/llvm-codegen.h b/be/src/codegen/llvm-codegen.h
index 602dc5d..095f939 100644
--- a/be/src/codegen/llvm-codegen.h
+++ b/be/src/codegen/llvm-codegen.h
@@ -359,11 +359,16 @@ class LlvmCodeGen {
   /// struct allocated.  The allocated variable is scoped to the function.
   //
   /// This should always be used instead of calling LlvmBuilder::CreateAlloca directly.
-  /// LLVM doesn't optimize alloca's occuring in the middle of functions very well (e.g, an
-  /// alloca may end up in a loop, potentially blowing the stack).
+  /// LLVM doesn't optimize alloca's occurring in the middle of functions very well (e.g,
+  /// an alloca may end up in a loop, potentially blowing the stack).
   llvm::AllocaInst* CreateEntryBlockAlloca(llvm::Function* f, const NamedVariable& var);
+  llvm::AllocaInst* CreateEntryBlockAlloca(
+      const LlvmBuilder& builder, llvm::Type* type, const char* name = "");
+
+  /// Same as above, except allocates an array of 'type' with 'num_entries' entries
+  /// and alignment 'alignment'.
   llvm::AllocaInst* CreateEntryBlockAlloca(const LlvmBuilder& builder, llvm::Type* type,
-                                           const char* name = "");
+      int num_entries, int alignment, const char* name = "");
 
   /// Utility to create two blocks in 'fn' for if/else codegen.  if_block and else_block
   /// are return parameters.  insert_before is optional and if set, the two blocks
@@ -388,6 +393,9 @@ class LlvmCodeGen {
   /// can generate constant up to 128-bit wide. 'byte_size' must be power of 2.
   llvm::Value* GetIntConstant(int byte_size, uint64_t low_bits, uint64_t high_bits);
 
+  /// Initialise a constant global string and returns an i8* pointer to it.
+  llvm::Value* GetStringConstant(LlvmBuilder* builder, char* data, int len);
+
   /// Returns true/false constants (bool type)
   llvm::Value* true_value() { return true_value_; }
   llvm::Value* false_value() { return false_value_; }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exec/partitioned-aggregation-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/partitioned-aggregation-node.cc b/be/src/exec/partitioned-aggregation-node.cc
index 6e587a5..17fbb6c 100644
--- a/be/src/exec/partitioned-aggregation-node.cc
+++ b/be/src/exec/partitioned-aggregation-node.cc
@@ -253,14 +253,7 @@ Status PartitionedAggregationNode::Prepare(RuntimeState* state) {
     needs_serialize_ |= aggregate_evaluators_[i]->SupportsSerialize();
   }
 
-  if (grouping_expr_ctxs_.empty()) {
-    // Create single output tuple; we need to output something even if our input is empty.
-    singleton_output_tuple_ =
-        ConstructSingletonOutputTuple(agg_fn_ctxs_, mem_pool_.get());
-    // Check for failures during AggFnEvaluator::Init().
-    RETURN_IF_ERROR(state_->GetQueryStatus());
-    singleton_output_tuple_returned_ = false;
-  } else {
+  if (!grouping_expr_ctxs_.empty()) {
     RETURN_IF_ERROR(HashTableCtx::Create(state, build_expr_ctxs_, grouping_expr_ctxs_,
         true, vector<bool>(build_expr_ctxs_.size(), true), state->fragment_hash_seed(),
         MAX_PARTITION_DEPTH, 1, mem_tracker(), &ht_ctx_));
@@ -314,6 +307,16 @@ Status PartitionedAggregationNode::Open(RuntimeState* state) {
     RETURN_IF_ERROR(aggregate_evaluators_[i]->Open(state, agg_fn_ctxs_[i]));
   }
 
+  if (grouping_expr_ctxs_.empty()) {
+    // Create the single output tuple for this non-grouping agg. This must happen after
+    // opening the aggregate evaluators.
+    singleton_output_tuple_ =
+        ConstructSingletonOutputTuple(agg_fn_ctxs_, mem_pool_.get());
+    // Check for failures during AggFnEvaluator::Init().
+    RETURN_IF_ERROR(state_->GetQueryStatus());
+    singleton_output_tuple_returned_ = false;
+  }
+
   RETURN_IF_ERROR(children_[0]->Open(state));
 
   // Streaming preaggregations do all processing in GetNext().
@@ -673,14 +676,7 @@ void PartitionedAggregationNode::CleanupHashTbl(
 
 Status PartitionedAggregationNode::Reset(RuntimeState* state) {
   DCHECK(!is_streaming_preagg_) << "Cannot reset preaggregation";
-  if (grouping_expr_ctxs_.empty()) {
-    // Re-create the single output tuple for this non-grouping agg.
-    singleton_output_tuple_ =
-        ConstructSingletonOutputTuple(agg_fn_ctxs_, mem_pool_.get());
-    // Check for failures during AggFnEvaluator::Init().
-    RETURN_IF_ERROR(state_->GetQueryStatus());
-    singleton_output_tuple_returned_ = false;
-  } else {
+  if (!grouping_expr_ctxs_.empty()) {
     child_eos_ = false;
     partition_eos_ = false;
     // Reset the HT and the partitions for this grouping agg.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/agg-fn-evaluator.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/agg-fn-evaluator.cc b/be/src/exprs/agg-fn-evaluator.cc
index ade7c10..4c6a993 100644
--- a/be/src/exprs/agg-fn-evaluator.cc
+++ b/be/src/exprs/agg-fn-evaluator.cc
@@ -39,6 +39,7 @@
 using namespace impala;
 using namespace impala_udf;
 using namespace llvm;
+using std::move;
 
 // typedef for builtin aggregate functions. Unfortunately, these type defs don't
 // really work since the actual builtin is implemented not in terms of the base
@@ -144,13 +145,19 @@ Status AggFnEvaluator::Prepare(RuntimeState* state, const RowDescriptor& desc,
   RETURN_IF_ERROR(
       Expr::Prepare(input_expr_ctxs_, state, desc, agg_fn_pool->mem_tracker()));
 
-  ObjectPool* obj_pool = state->obj_pool();
   for (int i = 0; i < input_expr_ctxs_.size(); ++i) {
-    staging_input_vals_.push_back(
-        CreateAnyVal(obj_pool, input_expr_ctxs_[i]->root()->type()));
+    AnyVal* staging_input_val;
+    RETURN_IF_ERROR(
+        AllocateAnyVal(state, agg_fn_pool, input_expr_ctxs_[i]->root()->type(),
+            "Could not allocate aggregate expression input value", &staging_input_val));
+    staging_input_vals_.push_back(staging_input_val);
   }
-  staging_intermediate_val_ = CreateAnyVal(obj_pool, intermediate_type());
-  staging_merge_input_val_ = CreateAnyVal(obj_pool, intermediate_type());
+  RETURN_IF_ERROR(AllocateAnyVal(state, agg_fn_pool, intermediate_type(),
+      "Could not allocate aggregate expression intermediate value",
+      &staging_intermediate_val_));
+  RETURN_IF_ERROR(AllocateAnyVal(state, agg_fn_pool, intermediate_type(),
+      "Could not allocate aggregate expression merge input value",
+      &staging_merge_input_val_));
 
   if (is_merge_) {
     DCHECK_EQ(staging_input_vals_.size(), 1) << "Merge should only have 1 input.";
@@ -223,9 +230,12 @@ Status AggFnEvaluator::Open(RuntimeState* state, FunctionContext* agg_fn_ctx) {
   // on them).
   vector<AnyVal*> constant_args(input_expr_ctxs_.size());
   for (int i = 0; i < input_expr_ctxs_.size(); ++i) {
-    constant_args[i] = input_expr_ctxs_[i]->root()->GetConstVal(input_expr_ctxs_[i]);
+    ExprContext* input_ctx = input_expr_ctxs_[i];
+    AnyVal* const_val;
+    RETURN_IF_ERROR(input_ctx->root()->GetConstVal(state, input_ctx, &const_val));
+    constant_args[i] = const_val;
   }
-  agg_fn_ctx->impl()->SetConstantArgs(constant_args);
+  agg_fn_ctx->impl()->SetConstantArgs(move(constant_args));
   return Status::OK();
 }
 
@@ -301,8 +311,7 @@ inline void AggFnEvaluator::SetDstSlot(FunctionContext* ctx, const AnyVal* src,
           // This code seems to trip up clang causing it to generate code that crashes.
           // Be careful when modifying this. See IMPALA-959 for more details.
           // I suspect an issue with xmm registers not reading from aligned memory.
-          memcpy(slot, &reinterpret_cast<const DecimalVal*>(src)->val4,
-              dst_slot_desc->type().GetByteSize());
+          memcpy(slot, &reinterpret_cast<const DecimalVal*>(src)->val16, 16);
 #else
           DCHECK(false) << "Not implemented.";
 #endif
@@ -318,6 +327,8 @@ inline void AggFnEvaluator::SetDstSlot(FunctionContext* ctx, const AnyVal* src,
 // This function would be replaced in codegen.
 void AggFnEvaluator::Init(FunctionContext* agg_fn_ctx, Tuple* dst) {
   DCHECK(init_fn_ != NULL);
+  for (ExprContext* ctx : input_expr_ctxs_) DCHECK(ctx->opened());
+
   if (intermediate_type().type == TYPE_CHAR) {
     // For type char, we want to initialize the staging_intermediate_val_ with
     // a pointer into the tuple (the UDA should not be allocating it).

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/aggregate-functions-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/aggregate-functions-ir.cc b/be/src/exprs/aggregate-functions-ir.cc
index fe19829..c07d1db 100644
--- a/be/src/exprs/aggregate-functions-ir.cc
+++ b/be/src/exprs/aggregate-functions-ir.cc
@@ -186,7 +186,7 @@ void AggregateFunctions::InitZero(FunctionContext*, T* dst) {
 template<>
 void AggregateFunctions::InitZero(FunctionContext*, DecimalVal* dst) {
   dst->is_null = false;
-  dst->val16 = 0;
+  dst->val16 = 0; // Also initializes val4 and val8 to 0.
 }
 
 template <typename T>
@@ -1026,6 +1026,8 @@ bool SampleValLess(const ReservoirSample<StringVal>& i,
 template <>
 bool SampleValLess(const ReservoirSample<DecimalVal>& i,
     const ReservoirSample<DecimalVal>& j) {
+  // Also handles val4 and val8 - the DecimalVal memory layout ensures the least
+  // significant bits overlap in memory.
   return i.val.val16 < j.val.val16;
 }
 
@@ -1091,6 +1093,8 @@ void PrintSample(const ReservoirSample<StringVal>& v, ostream* os) {
 
 template <>
 void PrintSample(const ReservoirSample<DecimalVal>& v, ostream* os) {
+  // Also handles val4 and val8 - the DecimalVal memory layout ensures the least
+  // significant bits overlap in memory.
   *os << v.val.val16;
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/anyval-util.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/anyval-util.cc b/be/src/exprs/anyval-util.cc
index 1e3d409..e4d9337 100644
--- a/be/src/exprs/anyval-util.cc
+++ b/be/src/exprs/anyval-util.cc
@@ -19,6 +19,9 @@
 #include "codegen/llvm-codegen.h"
 
 #include "common/object-pool.h"
+#include "gutil/strings/substitute.h"
+#include "runtime/mem-pool.h"
+#include "runtime/mem-tracker.h"
 
 #include "common/names.h"
 
@@ -26,30 +29,19 @@ using namespace impala_udf;
 
 namespace impala {
 
-AnyVal* CreateAnyVal(ObjectPool* pool, const ColumnType& type) {
-  return pool->Add(CreateAnyVal(type));
-}
-
-AnyVal* CreateAnyVal(const ColumnType& type) {
-  switch(type.type) {
-    case TYPE_NULL: return new AnyVal;
-    case TYPE_BOOLEAN: return new BooleanVal;
-    case TYPE_TINYINT: return new TinyIntVal;
-    case TYPE_SMALLINT: return new SmallIntVal;
-    case TYPE_INT: return new IntVal;
-    case TYPE_BIGINT: return new BigIntVal;
-    case TYPE_FLOAT: return new FloatVal;
-    case TYPE_DOUBLE: return new DoubleVal;
-    case TYPE_STRING:
-    case TYPE_VARCHAR:
-    case TYPE_CHAR:
-      return new StringVal;
-    case TYPE_TIMESTAMP: return new TimestampVal;
-    case TYPE_DECIMAL: return new DecimalVal;
-    default:
-      DCHECK(false) << "Unsupported type: " << type;
-      return NULL;
+Status AllocateAnyVal(RuntimeState* state, MemPool* pool, const ColumnType& type,
+    const std::string& mem_limit_exceeded_msg, AnyVal** result) {
+  int anyval_size = AnyValUtil::AnyValSize(type);
+  // Ensure the allocation is sufficiently aligned (e.g. DecimalVal has a 16-byte
+  // alignment requirement).
+  int alignment = BitUtil::RoundUpToPowerOfTwo(anyval_size);
+  *result = reinterpret_cast<AnyVal*>(pool->TryAllocateAligned(anyval_size, alignment));
+  if (*result == NULL) {
+    return pool->mem_tracker()->MemLimitExceeded(
+        state, mem_limit_exceeded_msg, anyval_size);
   }
+  memset(*result, 0, anyval_size);
+  return Status::OK();
 }
 
 FunctionContext::TypeDesc AnyValUtil::ColumnTypeToTypeDesc(const ColumnType& type) {

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/anyval-util.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/anyval-util.h b/be/src/exprs/anyval-util.h
index 624c79e..b84187a 100644
--- a/be/src/exprs/anyval-util.h
+++ b/be/src/exprs/anyval-util.h
@@ -15,7 +15,6 @@
 // specific language governing permissions and limitations
 // under the License.
 
-
 #ifndef IMPALA_EXPRS_ANYVAL_UTIL_H
 #define IMPALA_EXPRS_ANYVAL_UTIL_H
 
@@ -271,7 +270,7 @@ class AnyValUtil {
             return;
 #if __BYTE_ORDER == __LITTLE_ENDIAN
           case 16:
-            memcpy(&reinterpret_cast<DecimalVal*>(dst)->val4, slot, type.GetByteSize());
+            memcpy(&reinterpret_cast<DecimalVal*>(dst)->val16, slot, 16);
 #else
             DCHECK(false) << "Not implemented.";
 #endif
@@ -286,19 +285,20 @@ class AnyValUtil {
 
  private:
   /// Implementations of Equals().
-  template<typename T>
+  template <typename T>
   static inline bool EqualsInternal(const T& x, const T& y);
-  static inline bool DecimalEquals(int precision, const DecimalVal& x,
-      const DecimalVal& y);
+  static inline bool DecimalEquals(
+      int precision, const DecimalVal& x, const DecimalVal& y);
 };
 
-/// Creates the corresponding AnyVal subclass for type. The object is added to the pool.
-impala_udf::AnyVal* CreateAnyVal(ObjectPool* pool, const ColumnType& type);
-
-/// Creates the corresponding AnyVal subclass for type. The object is owned by the caller.
-impala_udf::AnyVal* CreateAnyVal(const ColumnType& type);
+/// Allocates an AnyVal subclass of 'type' from 'pool'. The AnyVal's memory is
+/// initialized to all 0's. Returns a MemLimitExceeded() error with message
+/// 'mem_limit_exceeded_msg' if the allocation cannot be made because of a memory
+/// limit.
+Status AllocateAnyVal(RuntimeState* state, MemPool* pool, const ColumnType& type,
+    const std::string& mem_limit_exceeded_msg, AnyVal** result);
 
-template<typename T>
+template <typename T>
 inline bool AnyValUtil::EqualsInternal(const T& x, const T& y) {
   DCHECK(!x.is_null);
   DCHECK(!y.is_null);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/case-expr.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/case-expr.cc b/be/src/exprs/case-expr.cc
index 2262c22..f139d64 100644
--- a/be/src/exprs/case-expr.cc
+++ b/be/src/exprs/case-expr.cc
@@ -63,13 +63,14 @@ Status CaseExpr::Open(RuntimeState* state, ExprContext* ctx,
     return fn_ctx->impl()->state()->GetQueryStatus();
   }
   fn_ctx->SetFunctionState(FunctionContext::THREAD_LOCAL, case_state);
-  if (has_case_expr_) {
-    case_state->case_val = CreateAnyVal(state->obj_pool(), children_[0]->type());
-    case_state->when_val = CreateAnyVal(state->obj_pool(), children_[1]->type());
-  } else {
-    case_state->case_val = CreateAnyVal(state->obj_pool(), TYPE_BOOLEAN);
-    case_state->when_val = CreateAnyVal(state->obj_pool(), children_[0]->type());
-  }
+
+  const ColumnType& case_val_type = has_case_expr_ ? children_[0]->type() : TYPE_BOOLEAN;
+  RETURN_IF_ERROR(AllocateAnyVal(state, ctx->pool_.get(), case_val_type,
+      "Could not allocate expression value", &case_state->case_val));
+  const ColumnType& when_val_type =
+      has_case_expr_ ? children_[1]->type() : children_[0]->type();
+  RETURN_IF_ERROR(AllocateAnyVal(state, ctx->pool_.get(), when_val_type,
+      "Could not allocate expression value", &case_state->when_val));
   return Status::OK();
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/expr-context.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr-context.h b/be/src/exprs/expr-context.h
index 09de6a5..39abc5f 100644
--- a/be/src/exprs/expr-context.h
+++ b/be/src/exprs/expr-context.h
@@ -20,10 +20,11 @@
 
 #include <boost/scoped_ptr.hpp>
 
+#include "common/object-pool.h"
 #include "common/status.h"
 #include "exprs/expr-value.h"
-#include "udf/udf.h"
 #include "udf/udf-internal.h" // for CollectionVal
+#include "udf/udf.h"
 
 using namespace impala_udf;
 
@@ -61,7 +62,9 @@ class ExprContext {
   /// originals but have their own MemPool and thread-local state. Clone() should be used
   /// to create an ExprContext for each execution thread that needs to evaluate
   /// 'root'. Note that clones are already opened. '*new_context' must be initialized by
-  /// the caller to NULL.
+  /// the caller to NULL. The cloned ExprContext cannot be used after the original
+  /// ExprContext is destroyed because it may reference fragment-local state from the
+  /// original.
   Status Clone(RuntimeState* state, ExprContext** new_context);
 
   /// Closes all FunctionContexts. Must be called on every ExprContext, including clones.
@@ -111,6 +114,7 @@ class ExprContext {
   }
 
   Expr* root() const { return root_; }
+  bool opened() const { return opened_; }
   bool closed() const { return closed_; }
   bool is_clone() const { return is_clone_; }
 
@@ -143,7 +147,8 @@ class ExprContext {
 
  private:
   friend class Expr;
-  /// Users of private GetValue()
+  /// Users of private GetValue() or 'pool_'.
+  friend class CaseExpr;
   friend class HiveUdfCall;
   friend class ScalarFnCall;
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/expr.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr.cc b/be/src/exprs/expr.cc
index 8d21830..0141bae 100644
--- a/be/src/exprs/expr.cc
+++ b/be/src/exprs/expr.cc
@@ -29,10 +29,8 @@
 #include "codegen/llvm-codegen.h"
 #include "common/object-pool.h"
 #include "common/status.h"
-#include "exprs/anyval-util.h"
-#include "exprs/expr.h"
-#include "exprs/expr-context.h"
 #include "exprs/aggregate-functions.h"
+#include "exprs/anyval-util.h"
 #include "exprs/bit-byte-functions.h"
 #include "exprs/case-expr.h"
 #include "exprs/cast-functions.h"
@@ -40,6 +38,8 @@
 #include "exprs/conditional-functions.h"
 #include "exprs/decimal-functions.h"
 #include "exprs/decimal-operators.h"
+#include "exprs/expr-context.h"
+#include "exprs/expr.h"
 #include "exprs/hive-udf-call.h"
 #include "exprs/in-predicate.h"
 #include "exprs/is-not-empty-predicate.h"
@@ -56,15 +56,16 @@
 #include "exprs/tuple-is-null-predicate.h"
 #include "exprs/udf-builtins.h"
 #include "exprs/utility-functions.h"
-#include "gen-cpp/Exprs_types.h"
 #include "gen-cpp/Data_types.h"
+#include "gen-cpp/Exprs_types.h"
 #include "runtime/lib-cache.h"
-#include "runtime/runtime-state.h"
+#include "runtime/mem-tracker.h"
 #include "runtime/raw-value.h"
-#include "runtime/tuple.h"
+#include "runtime/runtime-state.h"
 #include "runtime/tuple-row.h"
-#include "udf/udf.h"
+#include "runtime/tuple.h"
 #include "udf/udf-internal.h"
+#include "udf/udf.h"
 
 #include "gen-cpp/Exprs_types.h"
 #include "gen-cpp/ImpalaService_types.h"
@@ -526,63 +527,69 @@ void Expr::InitBuiltinsDummy() {
   UtilityFunctions::Pid(NULL);
 }
 
-AnyVal* Expr::GetConstVal(ExprContext* context) {
+Status Expr::GetConstVal(RuntimeState* state, ExprContext* context, AnyVal** const_val) {
   DCHECK(context->opened_);
-  if (!IsConstant()) return NULL;
-  if (constant_val_.get() != NULL) return constant_val_.get();
+  if (!IsConstant()) {
+    *const_val = NULL;
+    return Status::OK();
+  }
 
+  RETURN_IF_ERROR(AllocateAnyVal(state, context->pool_.get(), type_,
+      "Could not allocate constant expression value", const_val));
   switch (type_.type) {
-    case TYPE_BOOLEAN: {
-      constant_val_.reset(new BooleanVal(GetBooleanVal(context, NULL)));
+    case TYPE_BOOLEAN:
+      *reinterpret_cast<BooleanVal*>(*const_val) = GetBooleanVal(context, NULL);
       break;
-    }
-    case TYPE_TINYINT: {
-      constant_val_.reset(new TinyIntVal(GetTinyIntVal(context, NULL)));
+    case TYPE_TINYINT:
+      *reinterpret_cast<TinyIntVal*>(*const_val) = GetTinyIntVal(context, NULL);
       break;
-    }
-    case TYPE_SMALLINT: {
-      constant_val_.reset(new SmallIntVal(GetSmallIntVal(context, NULL)));
+    case TYPE_SMALLINT:
+      *reinterpret_cast<SmallIntVal*>(*const_val) = GetSmallIntVal(context, NULL);
       break;
-    }
-    case TYPE_INT: {
-      constant_val_.reset(new IntVal(GetIntVal(context, NULL)));
+    case TYPE_INT:
+      *reinterpret_cast<IntVal*>(*const_val) = GetIntVal(context, NULL);
       break;
-    }
-    case TYPE_BIGINT: {
-      constant_val_.reset(new BigIntVal(GetBigIntVal(context, NULL)));
+    case TYPE_BIGINT:
+      *reinterpret_cast<BigIntVal*>(*const_val) = GetBigIntVal(context, NULL);
       break;
-    }
-    case TYPE_FLOAT: {
-      constant_val_.reset(new FloatVal(GetFloatVal(context, NULL)));
+    case TYPE_FLOAT:
+      *reinterpret_cast<FloatVal*>(*const_val) = GetFloatVal(context, NULL);
       break;
-    }
-    case TYPE_DOUBLE: {
-      constant_val_.reset(new DoubleVal(GetDoubleVal(context, NULL)));
+    case TYPE_DOUBLE:
+      *reinterpret_cast<DoubleVal*>(*const_val) = GetDoubleVal(context, NULL);
       break;
-    }
     case TYPE_STRING:
     case TYPE_CHAR:
     case TYPE_VARCHAR: {
-      constant_val_.reset(new StringVal(GetStringVal(context, NULL)));
+      StringVal* sv = reinterpret_cast<StringVal*>(*const_val);
+      *sv = GetStringVal(context, NULL);
+      if (sv->len > 0) {
+        // Make sure the memory is owned by 'context'.
+        uint8_t* ptr_copy = context->pool_->TryAllocate(sv->len);
+        if (ptr_copy == NULL) {
+          return context->pool_->mem_tracker()->MemLimitExceeded(
+              state, "Could not allocate constant string value", sv->len);
+        }
+        memcpy(ptr_copy, sv->ptr, sv->len);
+        sv->ptr = ptr_copy;
+      }
       break;
     }
-    case TYPE_TIMESTAMP: {
-      constant_val_.reset(new TimestampVal(GetTimestampVal(context, NULL)));
+    case TYPE_TIMESTAMP:
+      *reinterpret_cast<TimestampVal*>(*const_val) = GetTimestampVal(context, NULL);
       break;
-    }
-    case TYPE_DECIMAL: {
-      constant_val_.reset(new DecimalVal(GetDecimalVal(context, NULL)));
+    case TYPE_DECIMAL:
+      *reinterpret_cast<DecimalVal*>(*const_val) = GetDecimalVal(context, NULL);
       break;
-    }
     default:
       DCHECK(false) << "Type not implemented: " << type();
   }
-  DCHECK(constant_val_.get() != NULL);
-  return constant_val_.get();
+  // Errors may have been set during the GetConstVal() call.
+  return GetFnContextError(context);
 }
 
 int Expr::GetConstantInt(const FunctionContext::TypeDesc& return_type,
-      const std::vector<FunctionContext::TypeDesc>& arg_types, ExprConstant c, int i) {
+    const std::vector<FunctionContext::TypeDesc>& arg_types, ExprConstant c, int i) {
   switch (c) {
     case RETURN_TYPE_SIZE:
       DCHECK_EQ(i, -1);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/expr.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr.h b/be/src/exprs/expr.h
index 99d463d..552fee9 100644
--- a/be/src/exprs/expr.h
+++ b/be/src/exprs/expr.h
@@ -88,16 +88,17 @@
 #ifndef IMPALA_EXPRS_EXPR_H
 #define IMPALA_EXPRS_EXPR_H
 
-#include <boost/scoped_ptr.hpp>
+#include <memory>
 #include <string>
 #include <vector>
+#include <boost/scoped_ptr.hpp>
 
 #include "common/global-types.h"
 #include "common/status.h"
 #include "impala-ir/impala-ir-functions.h"
 #include "runtime/types.h"
-#include "udf/udf.h"
 #include "udf/udf-internal.h" // for CollectionVal
+#include "udf/udf.h"
 
 using namespace impala_udf;
 
@@ -241,10 +242,13 @@ class Expr {
   virtual Status GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function** fn) = 0;
 
   /// If this expr is constant, evaluates the expr with no input row argument and returns
-  /// the output. Returns NULL if the argument is not constant. The returned AnyVal* is
-  /// owned by this expr. This should only be called after Open() has been called on this
-  /// expr.
-  virtual AnyVal* GetConstVal(ExprContext* context);
+  /// the result in 'const_val'. Sets 'const_val' to NULL if the argument is not constant.
+  /// The returned AnyVal and associated varlen data is owned by 'context'. This should
+  /// only be called after Open() has been called on this expr. Returns an error if there
+  /// was an error evaluating the expression or if memory could not be allocated for the
+  /// expression result.
+  virtual Status GetConstVal(
+      RuntimeState* state, ExprContext* context, AnyVal** const_val);
 
   virtual std::string DebugString() const;
   static std::string DebugString(const std::vector<Expr*>& exprs);
@@ -370,10 +374,6 @@ class Expr {
   /// Cached codegened compute function. Exprs should set this in GetCodegendComputeFn().
   llvm::Function* ir_compute_fn_;
 
-  /// If this expr is constant, this will store and cache the value generated by
-  /// GetConstVal().
-  boost::scoped_ptr<AnyVal> constant_val_;
-
   /// Helper function that calls ctx->Register(), sets fn_context_index_, and returns the
   /// registered FunctionContext.
   FunctionContext* RegisterFunctionContext(

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/hive-udf-call.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/hive-udf-call.cc b/be/src/exprs/hive-udf-call.cc
index 453f876..7d82431 100644
--- a/be/src/exprs/hive-udf-call.cc
+++ b/be/src/exprs/hive-udf-call.cc
@@ -54,6 +54,8 @@ struct JniContext {
   uint8_t output_null_value;
   bool warning_logged;
 
+  /// AnyVal to evaluate the expression into. Only used as temporary storage during
+  /// expression evaluation.
   AnyVal* output_anyval;
 
   JniContext()
@@ -62,13 +64,10 @@ struct JniContext {
       input_nulls_buffer(NULL),
       output_value_buffer(NULL),
       warning_logged(false),
-      output_anyval(NULL) {
-  }
+      output_anyval(NULL) {}
 };
 
-HiveUdfCall::HiveUdfCall(const TExprNode& node)
-  : Expr(node),
-    input_buffer_size_(0) {
+HiveUdfCall::HiveUdfCall(const TExprNode& node) : Expr(node), input_buffer_size_(0) {
   DCHECK_EQ(node.node_type, TExprNodeType::FUNCTION_CALL);
   DCHECK_EQ(node.fn.binary_type, TFunctionBinaryType::JAVA);
   DCHECK(executor_cl_ != NULL) << "Init() was not called!";
@@ -232,8 +231,8 @@ Status HiveUdfCall::Open(RuntimeState* state, ExprContext* ctx,
   RETURN_ERROR_IF_EXC(env);
   RETURN_IF_ERROR(JniUtil::LocalToGlobalRef(env, jni_ctx->executor, &jni_ctx->executor));
 
-  jni_ctx->output_anyval = CreateAnyVal(type_);
-
+  RETURN_IF_ERROR(AllocateAnyVal(state, ctx->pool_.get(), type_,
+      "Could not allocate JNI output value", &jni_ctx->output_anyval));
   return Status::OK();
 }
 
@@ -264,10 +263,7 @@ void HiveUdfCall::Close(RuntimeState* state, ExprContext* ctx,
         delete[] jni_ctx->output_value_buffer;
         jni_ctx->output_value_buffer = NULL;
       }
-      if (jni_ctx->output_anyval != NULL) {
-        delete jni_ctx->output_anyval;
-        jni_ctx->output_anyval = NULL;
-      }
+      jni_ctx->output_anyval = NULL;
       delete jni_ctx;
     } else {
       DCHECK(!ctx->opened_);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/in-predicate.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/in-predicate.h b/be/src/exprs/in-predicate.h
index fd53a42..52d2015 100644
--- a/be/src/exprs/in-predicate.h
+++ b/be/src/exprs/in-predicate.h
@@ -28,33 +28,32 @@
 
 namespace impala {
 
-/// Predicate for evaluating expressions of the form "val [NOT] IN (x1, x2, x3...)".
-//
+/// Utilities for evaluating expressions of the form "val [NOT] IN (x1, x2, x3...)".
+/// In predicates are implemented using ScalarFnCall and the UDF interface.
+///
 /// There are two strategies for evaluating the IN predicate:
 //
-/// 1) SET_LOOKUP: This strategy is for when all the values in the IN list are constant. In
-///    the prepare function, we create a set of the constant values from the IN list, and
-///    use this set to lookup a given 'val'.
+/// 1) SET_LOOKUP: This strategy is for when all the values in the IN list are constant.
+///    In the prepare function, we create a set of the constant values from the IN list,
+///    and use this set to lookup a given 'val'.
 //
-/// 2) ITERATE: This is the fallback strategy for when their are non-constant IN list
+/// 2) ITERATE: This is the fallback strategy for when there are non-constant IN list
 ///    values, or very few values in the IN list. We simply iterate through every
 ///    expression and compare it to val. This strategy has no prepare function.
 //
-/// The FE chooses which strategy we should use by choosing the appropriate function (e.g.,
-/// InIterate() or InSetLookup()). If it chooses SET_LOOKUP, it also sets the appropriate
-/// SetLookupPrepare and SetLookupClose functions.
-//
-/// TODO: the set lookup logic is not yet implemented for TimestampVals or DecimalVals
-class InPredicate : public Predicate {
+/// The FE chooses which strategy we should use by choosing the appropriate function
+/// (e.g., InIterate() or InSetLookup()). If it chooses SET_LOOKUP, it also sets the
+/// appropriate SetLookupPrepare and SetLookupClose functions.
+class InPredicate {
  public:
   /// Functions for every type
-  static impala_udf::BooleanVal InIterate(
-      impala_udf::FunctionContext* context, const impala_udf::BooleanVal& val,
-      int num_args, const impala_udf::BooleanVal* args);
+  static impala_udf::BooleanVal InIterate(impala_udf::FunctionContext* context,
+      const impala_udf::BooleanVal& val, int num_args,
+      const impala_udf::BooleanVal* args);
 
-  static impala_udf::BooleanVal NotInIterate(
-      impala_udf::FunctionContext* context, const impala_udf::BooleanVal& val,
-      int num_args, const impala_udf::BooleanVal* args);
+  static impala_udf::BooleanVal NotInIterate(impala_udf::FunctionContext* context,
+      const impala_udf::BooleanVal& val, int num_args,
+      const impala_udf::BooleanVal* args);
 
   static void SetLookupPrepare_boolean(impala_udf::FunctionContext* ctx,
       impala_udf::FunctionContext::FunctionStateScope scope);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/literal.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/literal.cc b/be/src/exprs/literal.cc
index caf6c14..03bdd50 100644
--- a/be/src/exprs/literal.cc
+++ b/be/src/exprs/literal.cc
@@ -21,9 +21,9 @@
 
 #include "codegen/codegen-anyval.h"
 #include "codegen/llvm-codegen.h"
+#include "gen-cpp/Exprs_types.h"
 #include "runtime/decimal-value.inline.h"
 #include "runtime/runtime-state.h"
-#include "gen-cpp/Exprs_types.h"
 
 #include "common/names.h"
 
@@ -414,7 +414,8 @@ Status Literal::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function** fn)
     case TYPE_VARCHAR:
     case TYPE_CHAR:
       v.SetLen(builder.getInt32(value_.string_val.len));
-      v.SetPtr(codegen->CastPtrToLlvmPtr(codegen->ptr_type(), value_.string_val.ptr));
+      v.SetPtr(codegen->GetStringConstant(
+          &builder, value_.string_val.ptr, value_.string_val.len));
       break;
     case TYPE_DECIMAL:
       switch (type().GetByteSize()) {

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/scalar-fn-call.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/scalar-fn-call.cc b/be/src/exprs/scalar-fn-call.cc
index 17cbd64..3a7ffb1 100644
--- a/be/src/exprs/scalar-fn-call.cc
+++ b/be/src/exprs/scalar-fn-call.cc
@@ -45,14 +45,22 @@
 using namespace impala;
 using namespace impala_udf;
 using namespace strings;
+using llvm::ArrayType;
+using llvm::BasicBlock;
+using llvm::Function;
+using llvm::GlobalVariable;
+using llvm::PointerType;
+using llvm::Type;
+using llvm::Value;
+using std::move;
+using std::pair;
 
 // Maximum number of arguments the interpretation path supports.
 #define MAX_INTERP_ARGS 20
 
 ScalarFnCall::ScalarFnCall(const TExprNode& node)
   : Expr(node),
-    vararg_start_idx_(node.__isset.vararg_start_idx ?
-        node.vararg_start_idx : -1),
+    vararg_start_idx_(node.__isset.vararg_start_idx ? node.vararg_start_idx : -1),
     scalar_fn_wrapper_(NULL),
     prepare_fn_(NULL),
     close_fn_(NULL),
@@ -84,17 +92,8 @@ Status ScalarFnCall::Prepare(RuntimeState* state, const RowDescriptor& desc,
     has_char_arg_or_result |= children_[i]->type_.type == TYPE_CHAR;
   }
 
-  // Compute buffer size for varargs
-  int varargs_buffer_size = 0;
-  if (vararg_start_idx_ != -1) {
-    DCHECK_GT(GetNumChildren(), vararg_start_idx_);
-    for (int i = vararg_start_idx_; i < GetNumChildren(); ++i) {
-      varargs_buffer_size += AnyValUtil::AnyValSize(children_[i]->type());
-    }
-  }
-
-  fn_context_index_ = context->Register(state, return_type, arg_types,
-      varargs_buffer_size);
+  fn_context_index_ =
+      context->Register(state, return_type, arg_types, ComputeVarArgsBufferSize());
 
   // Use the interpreted path and call the builtin without codegen if:
   // 1. codegen is disabled or
@@ -149,7 +148,7 @@ Status ScalarFnCall::Prepare(RuntimeState* state, const RowDescriptor& desc,
       RETURN_IF_ERROR(codegen->LinkModule(local_path));
     }
 
-    llvm::Function* ir_udf_wrapper;
+    Function* ir_udf_wrapper;
     RETURN_IF_ERROR(GetCodegendComputeFn(codegen, &ir_udf_wrapper));
     // TODO: don't do this for child exprs
     codegen->AddFunctionToJit(ir_udf_wrapper, &scalar_fn_wrapper_);
@@ -177,23 +176,52 @@ Status ScalarFnCall::Open(RuntimeState* state, ExprContext* ctx,
     // We're in the interpreted path (i.e. no JIT). Populate our FunctionContext's
     // staging_input_vals, which will be reused across calls to scalar_fn_.
     DCHECK(scalar_fn_wrapper_ == NULL);
-    ObjectPool* obj_pool = state->obj_pool();
     vector<AnyVal*>* input_vals = fn_ctx->impl()->staging_input_vals();
     for (int i = 0; i < NumFixedArgs(); ++i) {
-      input_vals->push_back(CreateAnyVal(obj_pool, children_[i]->type()));
+      AnyVal* input_val;
+      RETURN_IF_ERROR(AllocateAnyVal(state, ctx->pool_.get(), children_[i]->type(),
+          "Could not allocate expression value", &input_val));
+      input_vals->push_back(input_val);
     }
   }
 
-  // Only evaluate constant arguments once per fragment
+  // Only evaluate constant arguments at the top level of function contexts.
+  // If 'ctx' was cloned, the constant values were copied from the parent.
   if (scope == FunctionContext::FRAGMENT_LOCAL) {
     vector<AnyVal*> constant_args;
     for (int i = 0; i < children_.size(); ++i) {
-      constant_args.push_back(children_[i]->GetConstVal(ctx));
-      // Check if any errors were set during the GetConstVal() call
-      Status child_status = children_[i]->GetFnContextError(ctx);
-      if (!child_status.ok()) return child_status;
+      AnyVal* const_val;
+      RETURN_IF_ERROR(children_[i]->GetConstVal(state, ctx, &const_val));
+      constant_args.push_back(const_val);
+    }
+    fn_ctx->impl()->SetConstantArgs(move(constant_args));
+  }
+
+  if (scalar_fn_ != NULL) {
+    // Now we have the constant values, cache them so that the interpreted path can
+    // call the UDF without reevaluating the arguments. 'staging_input_vals' and
+    // 'varargs_buffer' in the FunctionContext are used to pass fixed and variable-length
+    // arguments respectively. 'non_constant_args()' in the FunctionContext will contain
+    // pointers to the remaining (non-constant) children that are evaluated for every row.
+    vector<pair<Expr*, AnyVal*>> non_constant_args;
+    uint8_t* varargs_buffer = fn_ctx->impl()->varargs_buffer();
+    for (int i = 0; i < children_.size(); ++i) {
+      AnyVal* input_arg;
+      int arg_bytes = AnyValUtil::AnyValSize(children_[i]->type());
+      if (i < NumFixedArgs()) {
+        input_arg = (*fn_ctx->impl()->staging_input_vals())[i];
+      } else {
+        input_arg = reinterpret_cast<AnyVal*>(varargs_buffer);
+        varargs_buffer += arg_bytes;
+      }
+      const AnyVal* constant_arg = fn_ctx->impl()->constant_args()[i];
+      if (constant_arg == NULL) {
+        non_constant_args.emplace_back(children_[i], input_arg);
+      } else {
+        memcpy(input_arg, constant_arg, arg_bytes);
+      }
     }
-    fn_ctx->impl()->SetConstantArgs(constant_args);
+    fn_ctx->impl()->SetNonConstantArgs(move(non_constant_args));
   }
 
   if (prepare_fn_ != NULL) {
@@ -264,7 +292,7 @@ bool ScalarFnCall::IsConstant() const {
 //        i32 4,
 //        i64* inttoptr (i64 89111072 to i64*))
 //   ret { i8, double } %result
-Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function** fn) {
+Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, Function** fn) {
   if (ir_compute_fn_ != NULL) {
     *fn = ir_compute_fn_;
     return Status::OK();
@@ -279,62 +307,56 @@ Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function**
     }
   }
 
-  llvm::Function* udf;
+  Function* udf;
   RETURN_IF_ERROR(GetUdf(codegen, &udf));
 
   // Create wrapper that computes args and calls UDF
   stringstream fn_name;
   fn_name << udf->getName().str() << "Wrapper";
 
-  llvm::Value* args[2];
+  Value* args[2];
   *fn = CreateIrFunctionPrototype(codegen, fn_name.str(), &args);
-  llvm::Value* expr_ctx = args[0];
-  llvm::Value* row = args[1];
-  llvm::BasicBlock* block = llvm::BasicBlock::Create(codegen->context(), "entry", *fn);
+  Value* expr_ctx = args[0];
+  Value* row = args[1];
+  BasicBlock* block = BasicBlock::Create(codegen->context(), "entry", *fn);
   LlvmCodeGen::LlvmBuilder builder(block);
 
   // Populate UDF arguments
-  vector<llvm::Value*> udf_args;
+  vector<Value*> udf_args;
 
   // First argument is always FunctionContext*.
   // Index into our registered offset in the ExprContext.
-  llvm::Value* expr_ctx_gep = builder.CreateStructGEP(NULL, expr_ctx, 1, "expr_ctx_gep");
-  llvm::Value* fn_ctxs_base = builder.CreateLoad(expr_ctx_gep, "fn_ctxs_base");
+  Value* expr_ctx_gep = builder.CreateStructGEP(NULL, expr_ctx, 1, "expr_ctx_gep");
+  Value* fn_ctxs_base = builder.CreateLoad(expr_ctx_gep, "fn_ctxs_base");
   // Use GEP to add our index to the base pointer
-  llvm::Value* fn_ctx_ptr =
+  Value* fn_ctx_ptr =
       builder.CreateConstGEP1_32(fn_ctxs_base, fn_context_index_, "fn_ctx_ptr");
-  llvm::Value* fn_ctx = builder.CreateLoad(fn_ctx_ptr, "fn_ctx");
+  Value* fn_ctx = builder.CreateLoad(fn_ctx_ptr, "fn_ctx");
   udf_args.push_back(fn_ctx);
 
-  // Get IR i8* pointer to varargs buffer from FunctionContext* argument
-  // (if there are varargs)
-  llvm::Value* varargs_buffer = NULL;
+  // Allocate a varargs array. The array's entry type is the appropriate AnyVal subclass.
+  // E.g. if the vararg type is STRING, and the function is called with 10 arguments, we
+  // allocate a StringVal[10] array. We allocate the buffer with Alloca so that LLVM can
+  // optimise out the buffer once the function call is inlined.
+  Value* varargs_buffer = NULL;
   if (vararg_start_idx_ != -1) {
-    // FunctionContextImpl is first field of FunctionContext
-    // fn_ctx_impl_ptr has type FunctionContextImpl**
-    llvm::Value* fn_ctx_impl_ptr = builder.CreateStructGEP(NULL, fn_ctx, 0, "fn_ctx_impl_ptr");
-    llvm::Value* fn_ctx_impl = builder.CreateLoad(fn_ctx_impl_ptr, "fn_ctx_impl");
-    // varargs_buffer is first field of FunctionContextImpl
-    // varargs_buffer_ptr has type i8**
-    llvm::Value* varargs_buffer_ptr = builder.CreateStructGEP(NULL, fn_ctx_impl, 0,
-        "varargs_buffer");
-    varargs_buffer = builder.CreateLoad(varargs_buffer_ptr);
+    Type* unlowered_varargs_type =
+        CodegenAnyVal::GetUnloweredType(codegen, VarArgsType());
+    varargs_buffer = codegen->CreateEntryBlockAlloca(builder, unlowered_varargs_type,
+        NumVarArgs(), FunctionContextImpl::VARARGS_BUFFER_ALIGNMENT, "varargs_buffer");
   }
-  // Tracks where to write the next vararg to
-  int varargs_buffer_offset = 0;
 
   // Call children to populate remaining arguments
   for (int i = 0; i < GetNumChildren(); ++i) {
-    llvm::Function* child_fn = NULL;
-    vector<llvm::Value*> child_fn_args;
+    Function* child_fn = NULL;
+    vector<Value*> child_fn_args;
     // Set 'child_fn' to the codegen'd function, sets child_fn = NULL if codegen fails
     children_[i]->GetCodegendComputeFn(codegen, &child_fn);
-
     if (child_fn == NULL) {
       // Set 'child_fn' to the interpreted function
       child_fn = GetStaticGetValWrapper(children_[i]->type(), codegen);
       // First argument to interpreted function is children_[i]
-      llvm::Type* expr_ptr_type = codegen->GetPtrType(Expr::LLVM_CLASS_NAME);
+      Type* expr_ptr_type = codegen->GetPtrType(Expr::LLVM_CLASS_NAME);
       child_fn_args.push_back(codegen->CastPtrToLlvmPtr(expr_ptr_type, children_[i]));
     }
     child_fn_args.push_back(expr_ctx);
@@ -342,33 +364,21 @@ Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function**
 
     // Call 'child_fn', adding the result to either 'udf_args' or 'varargs_buffer'
     DCHECK(child_fn != NULL);
-    llvm::Type* arg_type = CodegenAnyVal::GetUnloweredType(codegen, children_[i]->type());
-    llvm::Value* arg_val_ptr;
-    if (vararg_start_idx_ == -1 || i < vararg_start_idx_) {
-      // Either no varargs or arguments before varargs begin. Allocate space to store
-      // 'child_fn's result so we can pass the pointer to the UDF.
+    Type* arg_type = CodegenAnyVal::GetUnloweredType(codegen, children_[i]->type());
+    Value* arg_val_ptr;
+    if (i < NumFixedArgs()) {
+      // Allocate space to store 'child_fn's result so we can pass the pointer to the UDF.
       arg_val_ptr = codegen->CreateEntryBlockAlloca(builder, arg_type, "arg_val_ptr");
-
-      if (children_[i]->type().type == TYPE_DECIMAL) {
-        // UDFs may manipulate DecimalVal arguments via SIMD instructions such as 'movaps'
-        // that require 16-byte memory alignment. LLVM uses 8-byte alignment by default,
-        // so explicitly set the alignment for DecimalVals.
-        llvm::cast<llvm::AllocaInst>(arg_val_ptr)->setAlignment(16);
-      }
       udf_args.push_back(arg_val_ptr);
-     } else {
-      // Store the result of 'child_fn' in varargs_buffer + varargs_buffer_offset
-      arg_val_ptr =
-          builder.CreateConstGEP1_32(varargs_buffer, varargs_buffer_offset, "arg_val_ptr");
-      varargs_buffer_offset += AnyValUtil::AnyValSize(children_[i]->type());
-      // Cast arg_val_ptr from i8* to AnyVal pointer type
+    } else {
+      // Store the result of 'child_fn' in varargs_buffer[i].
       arg_val_ptr =
-          builder.CreateBitCast(arg_val_ptr, arg_type->getPointerTo(), "arg_val_ptr");
+          builder.CreateConstGEP1_32(varargs_buffer, i - NumFixedArgs(), "arg_val_ptr");
     }
     DCHECK_EQ(arg_val_ptr->getType(), arg_type->getPointerTo());
     // The result of the call must be stored in a lowered AnyVal
-    llvm::Value* lowered_arg_val_ptr = builder.CreateBitCast(
-        arg_val_ptr, CodegenAnyVal::GetLoweredPtrType(codegen, children_[i]->type()),
+    Value* lowered_arg_val_ptr = builder.CreateBitCast(arg_val_ptr,
+        CodegenAnyVal::GetLoweredPtrType(codegen, children_[i]->type()),
         "lowered_arg_val_ptr");
     CodegenAnyVal::CreateCall(
         codegen, &builder, child_fn, child_fn_args, "arg_val", lowered_arg_val_ptr);
@@ -379,16 +389,15 @@ Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function**
     DCHECK_EQ(udf_args.size(), vararg_start_idx_ + 1);
     DCHECK_GE(GetNumChildren(), 1);
     // Add the number of varargs
-    udf_args.push_back(codegen->GetIntConstant(
-        TYPE_INT, GetNumChildren() - vararg_start_idx_));
+    udf_args.push_back(codegen->GetIntConstant(TYPE_INT, NumVarArgs()));
     // Add all the accumulated vararg inputs as one input argument.
-    llvm::PointerType* vararg_type = codegen->GetPtrType(
-        CodegenAnyVal::GetUnloweredType(codegen, children_.back()->type()));
+    PointerType* vararg_type =
+        codegen->GetPtrType(CodegenAnyVal::GetUnloweredType(codegen, VarArgsType()));
     udf_args.push_back(builder.CreateBitCast(varargs_buffer, vararg_type, "varargs"));
   }
 
   // Call UDF
-  llvm::Value* result_val =
+  Value* result_val =
       CodegenAnyVal::CreateCall(codegen, &builder, udf, udf_args, "result");
   builder.CreateRet(result_val);
 
@@ -401,7 +410,7 @@ Status ScalarFnCall::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function**
   return Status::OK();
 }
 
-Status ScalarFnCall::GetUdf(LlvmCodeGen* codegen, llvm::Function** udf) {
+Status ScalarFnCall::GetUdf(LlvmCodeGen* codegen, Function** udf) {
   // from_utc_timestamp() and to_utc_timestamp() have inline ASM that cannot be JIT'd.
   // TimestampFunctions::AddSub() contains a try/catch which doesn't work in JIT'd
   // code. Always use the interpreted version of these functions.
@@ -428,16 +437,16 @@ Status ScalarFnCall::GetUdf(LlvmCodeGen* codegen, llvm::Function** udf) {
     // Per the x64 ABI, DecimalVals are returned via a DecmialVal* output argument.
     // So, the return type is void.
     bool is_decimal = type().type == TYPE_DECIMAL;
-    llvm::Type* return_type = is_decimal ? codegen->void_type() :
-        CodegenAnyVal::GetLoweredType(codegen, type());
+    Type* return_type = is_decimal ? codegen->void_type() :
+                                     CodegenAnyVal::GetLoweredType(codegen, type());
 
-    // Convert UDF function pointer to llvm::Function*. Start by creating a function
+    // Convert UDF function pointer to Function*. Start by creating a function
     // prototype for it.
     LlvmCodeGen::FnPrototype prototype(codegen, fn_.scalar_fn.symbol, return_type);
 
     if (is_decimal) {
       // Per the x64 ABI, DecimalVals are returned via a DecmialVal* output argument
-      llvm::Type* output_type =
+      Type* output_type =
           codegen->GetPtrType(CodegenAnyVal::GetUnloweredType(codegen, type()));
       prototype.AddArgument("output", output_type);
     }
@@ -450,19 +459,19 @@ Status ScalarFnCall::GetUdf(LlvmCodeGen* codegen, llvm::Function** udf) {
     for (int i = 0; i < NumFixedArgs(); ++i) {
       stringstream arg_name;
       arg_name << "fixed_arg_" << i;
-      llvm::Type* arg_type = codegen->GetPtrType(
+      Type* arg_type = codegen->GetPtrType(
           CodegenAnyVal::GetUnloweredType(codegen, children_[i]->type()));
       prototype.AddArgument(arg_name.str(), arg_type);
     }
     // The varargs for the UDF function if there is any.
-    if (vararg_start_idx_ >= 0) {
-      llvm::Type* vararg_type = CodegenAnyVal::GetUnloweredPtrType(
+    if (NumVarArgs() > 0) {
+      Type* vararg_type = CodegenAnyVal::GetUnloweredPtrType(
           codegen, children_[vararg_start_idx_]->type());
       prototype.AddArgument("num_var_arg", codegen->GetType(TYPE_INT));
       prototype.AddArgument("var_arg", vararg_type);
     }
 
-    // Create a llvm::Function* with the generated type. This is only a function
+    // Create a Function* with the generated type. This is only a function
     // declaration, not a definition, since we do not create any basic blocks or
     // instructions in it.
     *udf = prototype.GeneratePrototype(NULL, NULL, false);
@@ -518,11 +527,11 @@ Status ScalarFnCall::GetFunction(RuntimeState* state, const string& symbol, void
     DCHECK_EQ(fn_.binary_type, TFunctionBinaryType::IR);
     LlvmCodeGen* codegen = state->codegen();
     DCHECK(codegen != NULL);
-    llvm::Function* ir_fn = codegen->GetFunction(symbol);
+    Function* ir_fn = codegen->GetFunction(symbol);
     if (ir_fn == NULL) {
       stringstream ss;
-      ss << "Unable to locate function " << symbol
-         << " from LLVM module " << fn_.hdfs_location;
+      ss << "Unable to locate function " << symbol << " from LLVM module "
+         << fn_.hdfs_location;
       return Status(ss.str());
     }
     codegen->AddFunctionToJit(ir_fn, fn);
@@ -530,21 +539,12 @@ Status ScalarFnCall::GetFunction(RuntimeState* state, const string& symbol, void
   }
 }
 
-void ScalarFnCall::EvaluateChildren(ExprContext* context, const TupleRow* row,
-                                    vector<AnyVal*>* input_vals) {
-  DCHECK_EQ(input_vals->size(), NumFixedArgs());
+void ScalarFnCall::EvaluateNonConstantChildren(
+    ExprContext* context, const TupleRow* row) {
   FunctionContext* fn_ctx = context->fn_context(fn_context_index_);
-  uint8_t* varargs_buffer = fn_ctx->impl()->varargs_buffer();
-  for (int i = 0; i < children_.size(); ++i) {
-    void* src_slot = context->GetValue(children_[i], row);
-    AnyVal* dst_val;
-    if (vararg_start_idx_ == -1 || i < vararg_start_idx_) {
-      dst_val = (*input_vals)[i];
-    } else {
-      dst_val = reinterpret_cast<AnyVal*>(varargs_buffer);
-      varargs_buffer += AnyValUtil::AnyValSize(children_[i]->type());
-    }
-    AnyValUtil::SetAnyVal(src_slot, children_[i]->type(), dst_val);
+  for (pair<Expr*, AnyVal*> child : fn_ctx->impl()->non_constant_args()) {
+    void* val = context->GetValue(child.first, row);
+    AnyValUtil::SetAnyVal(val, child.first->type(), child.second);
   }
 }
 
@@ -553,14 +553,13 @@ RETURN_TYPE ScalarFnCall::InterpretEval(ExprContext* context, const TupleRow* ro
   DCHECK(scalar_fn_ != NULL) << DebugString();
   FunctionContext* fn_ctx = context->fn_context(fn_context_index_);
   vector<AnyVal*>* input_vals = fn_ctx->impl()->staging_input_vals();
-  EvaluateChildren(context, row, input_vals);
+  EvaluateNonConstantChildren(context, row);
 
   if (vararg_start_idx_ == -1) {
     switch (children_.size()) {
-
 #define ARG_DECL_ONE(z, n, data) BOOST_PP_COMMA_IF(n) const AnyVal&
 #define ARG_DECL_LIST(n) \
-    FunctionContext* BOOST_PP_COMMA_IF(n) BOOST_PP_REPEAT(n, ARG_DECL_ONE, unused)
+  FunctionContext* BOOST_PP_COMMA_IF(n) BOOST_PP_REPEAT(n, ARG_DECL_ONE, unused)
 #define ARG_ONE(z, n, data) BOOST_PP_COMMA_IF(n) *(*input_vals)[n]
 #define ARG_LIST(n) fn_ctx BOOST_PP_COMMA_IF(n) BOOST_PP_REPEAT(n, ARG_ONE, unused)
 
@@ -706,8 +705,15 @@ DecimalVal ScalarFnCall::GetDecimalVal(ExprContext* context, const TupleRow* row
 
 string ScalarFnCall::DebugString() const {
   stringstream out;
-  out << "ScalarFnCall(udf_type=" << fn_.binary_type
-      << " location=" << fn_.hdfs_location
+  out << "ScalarFnCall(udf_type=" << fn_.binary_type << " location=" << fn_.hdfs_location
       << " symbol_name=" << fn_.scalar_fn.symbol << Expr::DebugString() << ")";
   return out.str();
 }
+
+int ScalarFnCall::ComputeVarArgsBufferSize() const {
+  for (int i = NumFixedArgs(); i < children_.size(); ++i) {
+    // All varargs should have same type.
+    DCHECK_EQ(children_[i]->type(), VarArgsType());
+  }
+  return NumVarArgs() == 0 ? 0 : NumVarArgs() * AnyValUtil::AnyValSize(VarArgsType());
+}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/scalar-fn-call.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/scalar-fn-call.h b/be/src/exprs/scalar-fn-call.h
index e48b9df..85191d5 100644
--- a/be/src/exprs/scalar-fn-call.h
+++ b/be/src/exprs/scalar-fn-call.h
@@ -83,6 +83,11 @@ class ScalarFnCall: public Expr {
   /// If this function does not have varargs, it is set to -1.
   int vararg_start_idx_;
 
+  /// Vector of all non-constant children expressions that need to be evaluated for
+  /// each input row. The first element of each pair is the child expression and the
+  /// second element in the value it must be evaluated into.
+  std::vector<std::pair<Expr*, impala_udf::AnyVal*>> non_constant_children_;
+
   /// Function pointer to the JIT'd function produced by GetCodegendComputeFn().
   /// Has signature *Val (ExprContext*, const TupleRow*), and calls the scalar
   /// function with signature like *Val (FunctionContext*, const *Val& arg1, ...)
@@ -106,6 +111,13 @@ class ScalarFnCall: public Expr {
     return vararg_start_idx_ >= 0 ? vararg_start_idx_ : children_.size();
   }
 
+  int NumVarArgs() const { return children_.size() - NumFixedArgs(); }
+
+  const ColumnType& VarArgsType() const {
+    DCHECK_GE(NumVarArgs(), 1);
+    return children_.back()->type();
+  }
+
   /// Loads the native or IR function from HDFS and puts the result in *udf.
   Status GetUdf(LlvmCodeGen* codegen, llvm::Function** udf);
 
@@ -114,16 +126,16 @@ class ScalarFnCall: public Expr {
   /// has been JIT'd (i.e. after Prepare() has completed).
   Status GetFunction(RuntimeState* state, const std::string& symbol, void** fn);
 
-  /// Evaluates the children exprs and stores the results in input_vals. Used in the
-  /// interpreted path.
-  void EvaluateChildren(ExprContext* context, const TupleRow* row,
-                        std::vector<impala_udf::AnyVal*>* input_vals);
+  /// Evaluates the non-constant children exprs. Used in the interpreted path.
+  void EvaluateNonConstantChildren(ExprContext* context, const TupleRow* row);
 
   /// Function to call scalar_fn_. Used in the interpreted path.
-  template<typename RETURN_TYPE>
+  template <typename RETURN_TYPE>
   RETURN_TYPE InterpretEval(ExprContext* context, const TupleRow* row);
-};
 
+  /// Computes the size of the varargs buffer in bytes (0 bytes if no varargs).
+  int ComputeVarArgsBufferSize() const;
+};
 }
 
 #endif

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/exprs/utility-functions-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/utility-functions-ir.cc b/be/src/exprs/utility-functions-ir.cc
index ba4a371..263c4d4 100644
--- a/be/src/exprs/utility-functions-ir.cc
+++ b/be/src/exprs/utility-functions-ir.cc
@@ -57,6 +57,7 @@ BigIntVal UtilityFunctions::FnvHashDecimal(FunctionContext* ctx,
   if (input_val.is_null) return BigIntVal::null();
   const FunctionContext::TypeDesc& input_type = *ctx->GetArgType(0);
   int byte_size = ColumnType::GetDecimalByteSize(input_type.precision);
+  // val4, val8 and val16 all start at the same memory address.
   return BigIntVal(HashUtil::FnvHash64(&input_val.val16, byte_size, HashUtil::FNV_SEED));
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/runtime/free-pool-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/free-pool-test.cc b/be/src/runtime/free-pool-test.cc
index f4490f0..63153c9 100644
--- a/be/src/runtime/free-pool-test.cc
+++ b/be/src/runtime/free-pool-test.cc
@@ -64,44 +64,52 @@ TEST(FreePoolTest, Basic) {
   pool.Free(p2);
   pool.Free(p3);
 
-  // We know have 2 1 byte allocations. Make a two byte allocation.
+  // We know have 2 1 byte allocations, which were rounded up to 8 bytes. Make an 8
+  // byte allocation, which can reuse one of the existing ones.
   uint8_t* p4 = pool.Allocate(2);
   memset(p4, 2, 123);
-  EXPECT_EQ(mem_pool.total_allocated_bytes(), 48);
-  EXPECT_TRUE(p4 != p1);
-  EXPECT_TRUE(p4 != p2);
-  EXPECT_TRUE(p4 != p3);
+  EXPECT_EQ(mem_pool.total_allocated_bytes(), 32);
+  EXPECT_TRUE(p4 == p1 || p4 == p2 || p4 == p3);
   pool.Free(p4);
 
+  // Make a 9 byte allocation, which requires a new allocation.
+  uint8_t* p5 = pool.Allocate(9);
+  memset(p5, 9, 123);
+  EXPECT_EQ(mem_pool.total_allocated_bytes(), 56);
+  pool.Free(p5);
+  EXPECT_TRUE(p5 != p1);
+  EXPECT_TRUE(p5 != p2);
+  EXPECT_TRUE(p5 != p3);
+
   // Everything's freed. Try grabbing the ones that have been allocated.
   p1 = pool.Allocate(1);
   *p1 = 123;
   p2 = pool.Allocate(1);
   *p2 = 123;
-  p3 = pool.Allocate(2);
-  memset(p3, 123, 2);
-  EXPECT_EQ(mem_pool.total_allocated_bytes(), 48);
+  p5 = pool.Allocate(9);
+  memset(p5, 123, 9);
+  EXPECT_EQ(mem_pool.total_allocated_bytes(), 56);
 
   // Make another 1 byte allocation.
   p4 = pool.Allocate(1);
   *p4 = 1;
-  EXPECT_EQ(mem_pool.total_allocated_bytes(), 64);
+  EXPECT_EQ(mem_pool.total_allocated_bytes(), 72);
 
   mem_pool.FreeAll();
 
   // Try making allocations larger than 1GB.
-  uint8_t* p5 = pool.Allocate(1LL << 32);
-  EXPECT_TRUE(p5 != NULL);
+  uint8_t* p6 = pool.Allocate(1LL << 32);
+  EXPECT_TRUE(p6 != NULL);
   for (int64_t i = 0; i < (1LL << 32); i += (1 << 29)) {
-    *(p5 + i) = i;
+    *(p6 + i) = i;
   }
   EXPECT_EQ(mem_pool.total_allocated_bytes(), (1LL << 32) + 8);
 
   // Test zero-byte allocation.
-  p5 = pool.Allocate(0);
-  EXPECT_TRUE(p5 != NULL);
+  p6 = pool.Allocate(0);
+  EXPECT_TRUE(p6 != NULL);
   EXPECT_EQ(mem_pool.total_allocated_bytes(), (1LL << 32) + 8);
-  pool.Free(p5);
+  pool.Free(p6);
 
   mem_pool.FreeAll();
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/runtime/free-pool.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/free-pool.h b/be/src/runtime/free-pool.h
index ca22c48..746b649 100644
--- a/be/src/runtime/free-pool.h
+++ b/be/src/runtime/free-pool.h
@@ -35,8 +35,8 @@ namespace impala {
 
 /// Implementation of a free pool to recycle allocations. The pool is broken
 /// up into 64 lists, one for each power of 2. Each allocation is rounded up
-/// to the next power of 2. When the allocation is freed, it is added to the
-/// corresponding free list.
+/// to the next power of 2 (or 8 bytes, whichever is larger). When the
+/// allocation is freed, it is added to the corresponding free list.
 /// Each allocation has an 8 byte header that immediately precedes the actual
 /// allocation. If the allocation is owned by the user, the header contains
 /// the ptr to the list that it should be added to on Free().
@@ -70,6 +70,9 @@ class FreePool {
     if (UNLIKELY(size == 0)) return mem_pool_->EmptyAllocPtr();
     ++net_allocations_;
     if (FLAGS_disable_mem_pools) return reinterpret_cast<uint8_t*>(malloc(size));
+    /// MemPool allocations are 8-byte aligned, so making allocations < 8 bytes
+    /// doesn't save memory and eliminates opportunities to recycle allocations.
+    size = std::max<int64_t>(8, size);
     int free_list_idx = Bits::Log2Ceiling64(size);
     DCHECK_LT(free_list_idx, NUM_LISTS);
     FreeListNode* allocation = lists_[free_list_idx].next;

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/runtime/mem-pool-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-pool-test.cc b/be/src/runtime/mem-pool-test.cc
index 604923c..70fb141 100644
--- a/be/src/runtime/mem-pool-test.cc
+++ b/be/src/runtime/mem-pool-test.cc
@@ -49,15 +49,22 @@ TEST(MemPoolTest, Basic) {
   MemPool p2(&tracker);
   MemPool p3(&tracker);
 
+  uint8_t* ptr = NULL;
+
   for (int iter = 0; iter < 2; ++iter) {
-    // allocate a total of 24K in 32-byte pieces (for which we only request 25 bytes)
+    // Allocate 768 25 + 7 byte allocations.
     for (int i = 0; i < 768; ++i) {
-      // pads to 32 bytes
-      p.Allocate(25);
+      ptr = p.Allocate(25);
+      EXPECT_TRUE(ptr != NULL);
+      EXPECT_EQ(0, reinterpret_cast<uintptr_t>(ptr) % MemPool::DEFAULT_ALIGNMENT)
+          << "Allocation should be aligned";
+
+      // Allocate the padding to get 32 bytes
+      ptr = p.TryAllocateAligned(7, 1);
+      EXPECT_TRUE(ptr != NULL);
     }
-    // we handed back 24K
+    // We allocated 24K from 28K of chunks (4, 8, 16)
     EXPECT_EQ(24 * 1024, p.total_allocated_bytes());
-    // .. and allocated 28K of chunks (4, 8, 16)
     EXPECT_EQ(28 * 1024, p.GetTotalChunkSizes());
 
     // we're passing on the first two chunks, containing 12K of data; we're left with
@@ -133,7 +140,7 @@ TEST(MemPoolTest, Basic) {
   }
 
   // Test zero byte allocation.
-  uint8_t* ptr = p.Allocate(0);
+  ptr = p.Allocate(0);
   EXPECT_TRUE(ptr != NULL);
   EXPECT_EQ(0, p.GetTotalChunkSizes());
 }
@@ -487,6 +494,24 @@ TEST(MemPoolTest, ReturnAllocationThenFailedAllocation) {
   src.FreeAll();
 }
 
+TEST(MemPoolTest, TryAllocateAligned) {
+  MemTracker tracker(-1);
+  MemPool pool(&tracker);
+  int alignment = 1;
+  int64_t size = 1;
+  constexpr int NUM_ALLOCATIONS = 100000;
+  constexpr int MAX_ALLOCATION_SIZE = 113;
+
+  for (int i = 0; i < NUM_ALLOCATIONS; ++i) {
+    uint8_t* ptr = pool.TryAllocateAligned(size, alignment);
+    ASSERT_TRUE(ptr != NULL);
+    ASSERT_EQ(0, reinterpret_cast<uintptr_t>(ptr) % alignment);
+    alignment = alignment == sizeof(std::max_align_t) ? 1 : alignment * 2;
+    size = (size + 1) % MAX_ALLOCATION_SIZE;
+  }
+
+  pool.FreeAll();
+}
 }
 
 IMPALA_TEST_MAIN();

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/runtime/mem-pool.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-pool.cc b/be/src/runtime/mem-pool.cc
index 185d8a3..c95953a 100644
--- a/be/src/runtime/mem-pool.cc
+++ b/be/src/runtime/mem-pool.cc
@@ -36,6 +36,7 @@ const int MemPool::INITIAL_CHUNK_SIZE;
 const int MemPool::MAX_CHUNK_SIZE;
 
 const char* MemPool::LLVM_CLASS_NAME = "class.impala::MemPool";
+const int MemPool::DEFAULT_ALIGNMENT;
 uint32_t MemPool::zero_length_region_ = MEM_POOL_POISON;
 
 MemPool::MemPool(MemTracker* mem_tracker)
@@ -101,8 +102,6 @@ void MemPool::FreeAll() {
 }
 
 bool MemPool::FindChunk(int64_t min_size, bool check_limits) noexcept {
-  DCHECK(current_chunk_idx_ == -1 || chunks_[current_chunk_idx_].size <
-      chunks_[current_chunk_idx_].allocated_bytes + min_size);
   // Try to allocate from a free chunk. We may have free chunks after the current chunk
   // if Clear() was called. The current chunk may be free if ReturnPartialAllocation()
   // was called. The first free chunk (if there is one) can therefore be either the
@@ -133,7 +132,8 @@ bool MemPool::FindChunk(int64_t min_size, bool check_limits) noexcept {
 
   if (FLAGS_disable_mem_pools) {
     // Disable pooling by sizing the chunk to fit only this allocation.
-    chunk_size = min_size;
+    // Make sure the alignment guarantees are respected.
+    chunk_size = std::max<int64_t>(min_size, sizeof(std::max_align_t));
   } else {
     DCHECK_GE(next_chunk_size_, INITIAL_CHUNK_SIZE);
     chunk_size = max<int64_t>(min_size, next_chunk_size_);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/runtime/mem-pool.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-pool.h b/be/src/runtime/mem-pool.h
index d8c1e61..8980d22 100644
--- a/be/src/runtime/mem-pool.h
+++ b/be/src/runtime/mem-pool.h
@@ -20,9 +20,11 @@
 #define IMPALA_RUNTIME_MEM_POOL_H
 
 #include <stdio.h>
+
 #include <algorithm>
-#include <vector>
+#include <cstddef>
 #include <string>
+#include <vector>
 
 #include "common/logging.h"
 #include "util/bit-util.h"
@@ -93,11 +95,11 @@ class MemPool {
   /// from the registered limits.
   ~MemPool();
 
-  /// Allocates 8-byte aligned section of memory of 'size' bytes at the end
+  /// Allocates a section of memory of 'size' bytes with DEFAULT_ALIGNMENT at the end
   /// of the the current chunk. Creates a new chunk if there aren't any chunks
   /// with enough capacity.
   uint8_t* Allocate(int64_t size) noexcept {
-    return Allocate<false>(size);
+    return Allocate<false>(size, DEFAULT_ALIGNMENT);
   }
 
   /// Same as Allocate() except the mem limit is checked before the allocation and
@@ -105,7 +107,16 @@ class MemPool {
   /// The caller must handle the NULL case. This should be used for allocations
   /// where the size can be very big to bound the amount by which we exceed mem limits.
   uint8_t* TryAllocate(int64_t size) noexcept {
-    return Allocate<true>(size);
+    return Allocate<true>(size, DEFAULT_ALIGNMENT);
+  }
+
+  /// Same as TryAllocate() except a non-default alignment can be specified. It
+  /// should be a power-of-two in [1, sizeof(std::max_align_t)].
+  uint8_t* TryAllocateAligned(int64_t size, int alignment) noexcept {
+    DCHECK_GE(alignment, 1);
+    DCHECK_LE(alignment, sizeof(std::max_align_t));
+    DCHECK_EQ(BitUtil::RoundUpToPowerOfTwo(alignment), alignment);
+    return Allocate<true>(size, alignment);
   }
 
   /// Returns 'byte_size' to the current chunk back to the mem pool. This can
@@ -151,6 +162,8 @@ class MemPool {
   /// For C++/IR interop, we need to be able to look up types by name.
   static const char* LLVM_CLASS_NAME;
 
+  static const int DEFAULT_ALIGNMENT = 8;
+
  private:
   friend class MemPoolTest;
   static const int INITIAL_CHUNK_SIZE = 4 * 1024;
@@ -224,22 +237,37 @@ class MemPool {
   }
 
   template <bool CHECK_LIMIT_FIRST>
-  uint8_t* Allocate(int64_t size) noexcept {
+  uint8_t* Allocate(int64_t size, int alignment) noexcept {
     DCHECK_GE(size, 0);
-    if (UNLIKELY(size == 0)) return reinterpret_cast<uint8_t *>(&zero_length_region_);
+    if (UNLIKELY(size == 0)) return reinterpret_cast<uint8_t*>(&zero_length_region_);
+
+    bool fits_in_chunk = false;
+    if (current_chunk_idx_ != -1) {
+      int64_t aligned_allocated_bytes = BitUtil::RoundUpToPowerOf2(
+          chunks_[current_chunk_idx_].allocated_bytes, alignment);
+      if (aligned_allocated_bytes + size <= chunks_[current_chunk_idx_].size) {
+        // Ensure the requested alignment is respected.
+        total_allocated_bytes_ +=
+            aligned_allocated_bytes - chunks_[current_chunk_idx_].allocated_bytes;
+        chunks_[current_chunk_idx_].allocated_bytes = aligned_allocated_bytes;
+        fits_in_chunk = true;
+      }
+    }
 
-    int64_t num_bytes = BitUtil::RoundUp(size, 8);
-    if (current_chunk_idx_ == -1
-        || num_bytes + chunks_[current_chunk_idx_].allocated_bytes
-          > chunks_[current_chunk_idx_].size) {
+    if (!fits_in_chunk) {
       // If we couldn't allocate a new chunk, return NULL.
-      if (UNLIKELY(!FindChunk(num_bytes, CHECK_LIMIT_FIRST))) return NULL;
+      // malloc() guarantees alignment of min(std::max_align_t, min_chunk_size),
+      // so we do not need to do anything additional to guarantee alignment.
+      static_assert(
+          INITIAL_CHUNK_SIZE >= sizeof(std::max_align_t), "Min chunk size too low");
+      if (UNLIKELY(!FindChunk(size, CHECK_LIMIT_FIRST))) return NULL;
     }
+
     ChunkInfo& info = chunks_[current_chunk_idx_];
     uint8_t* result = info.data + info.allocated_bytes;
-    DCHECK_LE(info.allocated_bytes + num_bytes, info.size);
-    info.allocated_bytes += num_bytes;
-    total_allocated_bytes_ += num_bytes;
+    DCHECK_LE(info.allocated_bytes + size, info.size);
+    info.allocated_bytes += size;
+    total_allocated_bytes_ += size;
     DCHECK_LE(current_chunk_idx_, chunks_.size() - 1);
     peak_allocated_bytes_ = std::max(total_allocated_bytes_, peak_allocated_bytes_);
     return result;
@@ -247,9 +275,8 @@ class MemPool {
 };
 
 // Stamp out templated implementations here so they're included in IR module
-template uint8_t* MemPool::Allocate<false>(int64_t size) noexcept;
-template uint8_t* MemPool::Allocate<true>(int64_t size) noexcept;
-
+template uint8_t* MemPool::Allocate<false>(int64_t size, int alignment) noexcept;
+template uint8_t* MemPool::Allocate<true>(int64_t size, int alignment) noexcept;
 }
 
 #endif

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/testutil/test-udas.cc
----------------------------------------------------------------------
diff --git a/be/src/testutil/test-udas.cc b/be/src/testutil/test-udas.cc
index dc295b5..569853e 100644
--- a/be/src/testutil/test-udas.cc
+++ b/be/src/testutil/test-udas.cc
@@ -117,3 +117,16 @@ const DoubleVal TruncSumSerialize(FunctionContext* context, const DoubleVal& tot
 BigIntVal TruncSumFinalize(FunctionContext* context, const DoubleVal& total) {
   return BigIntVal(static_cast<int64_t>(total.val));
 }
+
+// Defines aggregate function for testing constant argument handling. The UDA returns
+// true if its second argument is constant for all calls to Update().
+void ArgIsConstInit(FunctionContext* context, BooleanVal* is_const) {
+  *is_const = BooleanVal(context->IsArgConstant(1));
+}
+
+void ArgIsConstUpdate(FunctionContext* context, const IntVal& val,
+    const IntVal& const_arg, BooleanVal* is_const) {}
+
+void ArgIsConstMerge(FunctionContext* context, const BooleanVal& src, BooleanVal* dst) {
+  dst->val |= src.val;
+}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/udf/udf-internal.h
----------------------------------------------------------------------
diff --git a/be/src/udf/udf-internal.h b/be/src/udf/udf-internal.h
index eca62ab..2fd7415 100644
--- a/be/src/udf/udf-internal.h
+++ b/be/src/udf/udf-internal.h
@@ -15,15 +15,15 @@
 // specific language governing permissions and limitations
 // under the License.
 
-
 #ifndef IMPALA_UDF_UDF_INTERNAL_H
 #define IMPALA_UDF_UDF_INTERNAL_H
 
-#include <boost/cstdint.hpp>
+#include <string.h>
 #include <map>
 #include <string>
-#include <string.h>
+#include <utility>
 #include <vector>
+#include <boost/cstdint.hpp>
 
 /// Be very careful when adding Impala includes in this file. We don't want to pull
 /// in unnecessary dependencies for the development libs.
@@ -31,14 +31,15 @@
 
 namespace impala {
 
-#define RETURN_IF_NULL(ctx, ptr) \
-  do { \
-    if (UNLIKELY(ptr == NULL)) { \
+#define RETURN_IF_NULL(ctx, ptr)                            \
+  do {                                                      \
+    if (UNLIKELY(ptr == NULL)) {                            \
       DCHECK(!ctx->impl()->state()->GetQueryStatus().ok()); \
-      return; \
-    } \
+      return;                                               \
+    }                                                       \
   } while (false)
 
+class Expr;
 class FreePool;
 class MemPool;
 class RuntimeState;
@@ -71,7 +72,8 @@ class FunctionContextImpl {
 
   /// Returns a new FunctionContext with the same constant args, fragment-local state, and
   /// debug flag as this FunctionContext. The caller is responsible for calling delete on
-  /// it.
+  /// it. The cloned FunctionContext cannot be used after the original FunctionContext is
+  /// destroyed because it may reference fragment-local state from the original.
   impala_udf::FunctionContext* Clone(MemPool* pool);
 
   /// Allocates a buffer of 'byte_size' with "local" memory management.
@@ -90,8 +92,21 @@ class FunctionContextImpl {
   /// Returns true if there are any allocations returned by AllocateLocal().
   bool HasLocalAllocations() const { return !local_allocations_.empty(); }
 
-  /// Sets constant_args_. The AnyVal* values are owned by the caller.
-  void SetConstantArgs(const std::vector<impala_udf::AnyVal*>& constant_args);
+  /// Sets the constant arg list. The vector should contain one entry per argument,
+  /// with a non-NULL entry if the argument is constant. The AnyVal* values are
+  /// owned by the caller and must be allocated from the ExprContext's MemPool.
+  void SetConstantArgs(std::vector<impala_udf::AnyVal*>&& constant_args);
+
+  /// Sets the non-constant args. Contains one entry per non-constant argument. All
+  /// pointers should be non-NULL. The Expr* and AnyVal* values are owned by the caller.
+  /// The AnyVal* values must be allocated from the ExprContext's MemPool.
+  void SetNonConstantArgs(
+      std::vector<std::pair<Expr*, impala_udf::AnyVal*>>&& non_constant_args);
+
+  const std::vector<impala_udf::AnyVal*>& constant_args() const { return constant_args_; }
+  const std::vector<std::pair<Expr*, impala_udf::AnyVal*>>& non_constant_args() const {
+    return non_constant_args_;
+  }
 
   uint8_t* varargs_buffer() { return varargs_buffer_; }
 
@@ -111,6 +126,9 @@ class FunctionContextImpl {
     return arg_types_;
   }
 
+  // UDFs may manipulate DecimalVal arguments via SIMD instructions such as 'movaps'
+  // that require 16-byte memory alignment.
+  static const int VARARGS_BUFFER_ALIGNMENT = 16;
   static const char* LLVM_FUNCTIONCONTEXT_NAME;
 
   RuntimeState* state() { return state_; }
@@ -127,10 +145,8 @@ class FunctionContextImpl {
   bool CheckAllocResult(const char* fn_name, uint8_t* buf, int64_t byte_size);
 
   /// Preallocated buffer for storing varargs (if the function has any). Allocated and
-  /// owned by this object, but populated by an Expr function.
-  //
-  /// This is the first field in the class so it's easy to access in codegen'd functions.
-  /// Don't move it or add fields above unless you know what you're doing.
+  /// owned by this object, but populated by an Expr function. The buffer is interpreted
+  /// as an array of the appropriate AnyVal subclass.
   uint8_t* varargs_buffer_;
   int varargs_buffer_size_;
 
@@ -186,13 +202,20 @@ class FunctionContextImpl {
 
   /// Contains an AnyVal* for each argument of the function. If the AnyVal* is NULL,
   /// indicates that the corresponding argument is non-constant. Otherwise contains the
-  /// value of the argument.
+  /// value of the argument. The AnyVal* objects and associated data are owned by the
+  /// ExprContext provided when opening the FRAGMENT_LOCAL expression contexts.
   std::vector<impala_udf::AnyVal*> constant_args_;
 
-  /// Used by ScalarFnCall to store the arguments when running without codegen. Allows us
-  /// to pass AnyVal* arguments to the scalar function directly, rather than codegening a
-  /// call that passes the correct AnyVal subclass pointer type. Note that this is only
-  /// used for non-variadic arguments; varargs are always stored in varargs_buffer_.
+  /// Vector of all non-constant children expressions that need to be evaluated for
+  /// each input row. The first element of each pair is the child expression and the
+  /// second element is the value it must be evaluated into.
+  std::vector<std::pair<Expr*, impala_udf::AnyVal*>> non_constant_args_;
+
+  /// Used by ScalarFnCall to temporarily store arguments for a UDF when running without
+  /// codegen. Allows us to pass AnyVal* arguments to the scalar function directly,
+  /// rather than codegening a call that passes the correct AnyVal subclass pointer type.
+  /// Note that this is only used for non-variadic arguments; varargs are always stored
+  /// in varargs_buffer_.
   std::vector<impala_udf::AnyVal*> staging_input_vals_;
 
   /// Indicates whether this context has been closed. Used for verification/debugging.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/udf/udf-test-harness.cc
----------------------------------------------------------------------
diff --git a/be/src/udf/udf-test-harness.cc b/be/src/udf/udf-test-harness.cc
index cfefdf7..1c3737d 100644
--- a/be/src/udf/udf-test-harness.cc
+++ b/be/src/udf/udf-test-harness.cc
@@ -37,7 +37,7 @@ void UdfTestHarness::SetConstantArgs(
     context->SetError("SetConstantArgs() called on non-test FunctionContext");
     return;
   }
-  context->impl()->SetConstantArgs(constant_args);
+  context->impl()->SetConstantArgs(vector<AnyVal*>(constant_args));
 }
 
 void UdfTestHarness::CloseContext(FunctionContext* context) {

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/10fa472f/be/src/udf/udf.cc
----------------------------------------------------------------------
diff --git a/be/src/udf/udf.cc b/be/src/udf/udf.cc
index bfdbccc..cd29f6f 100644
--- a/be/src/udf/udf.cc
+++ b/be/src/udf/udf.cc
@@ -109,12 +109,17 @@ class RuntimeState {
 
 using namespace impala;
 using namespace impala_udf;
+using std::pair;
 
+const int FunctionContextImpl::VARARGS_BUFFER_ALIGNMENT;
 const char* FunctionContextImpl::LLVM_FUNCTIONCONTEXT_NAME =
     "class.impala_udf::FunctionContext";
 
 static const int MAX_WARNINGS = 1000;
 
+static_assert(__BYTE_ORDER == __LITTLE_ENDIAN,
+    "DecimalVal memory layout assumes little-endianness");
+
 FunctionContext* FunctionContextImpl::CreateContext(RuntimeState* state, MemPool* pool,
     const FunctionContext::TypeDesc& return_type,
     const vector<FunctionContext::TypeDesc>& arg_types,
@@ -138,14 +143,11 @@ FunctionContext* FunctionContextImpl::CreateContext(RuntimeState* state, MemPool
   ctx->impl_->intermediate_type_ = intermediate_type;
   ctx->impl_->return_type_ = return_type;
   ctx->impl_->arg_types_ = arg_types;
-  // UDFs may manipulate DecimalVal arguments via SIMD instructions such as 'movaps'
-  // that require 16-byte memory alignment.
-  ctx->impl_->varargs_buffer_ =
-      reinterpret_cast<uint8_t*>(aligned_malloc(varargs_buffer_size, 16));
+  ctx->impl_->varargs_buffer_ = reinterpret_cast<uint8_t*>(
+      aligned_malloc(varargs_buffer_size, VARARGS_BUFFER_ALIGNMENT));
   ctx->impl_->varargs_buffer_size_ = varargs_buffer_size;
   ctx->impl_->debug_ = debug;
-  VLOG_ROW << "Created FunctionContext: " << ctx
-           << " with pool " << ctx->impl_->pool_;
+  VLOG_ROW << "Created FunctionContext: " << ctx << " with pool " << ctx->impl_->pool_;
   return ctx;
 }
 
@@ -450,18 +452,22 @@ void FunctionContextImpl::FreeLocalAllocations() noexcept {
   local_allocations_.clear();
 }
 
-void FunctionContextImpl::SetConstantArgs(const vector<AnyVal*>& constant_args) {
+void FunctionContextImpl::SetConstantArgs(vector<AnyVal*>&& constant_args) {
   constant_args_ = constant_args;
 }
 
+void FunctionContextImpl::SetNonConstantArgs(
+    vector<pair<Expr*, AnyVal*>>&& non_constant_args) {
+  non_constant_args_ = non_constant_args;
+}
+
 // Note: this function crashes LLVM's JIT in expr-test if it's xcompiled. Do not move to
 // expr-ir.cc. This could probably use further investigation.
-StringVal::StringVal(FunctionContext* context, int len) noexcept
-  : len(len), ptr(NULL) {
+StringVal::StringVal(FunctionContext* context, int len) noexcept : len(len), ptr(NULL) {
   if (UNLIKELY(len > StringVal::MAX_LENGTH)) {
     std::cout << "MAX_LENGTH, Trying to allocate " << len;
     context->SetError("String length larger than allowed limit of "
-        "1 GB character data.");
+                      "1 GB character data.");
     len = 0;
     is_null = true;
   } else {



Mime
View raw message