impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Date Sat, 03 Jun 2017 23:53:04 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext

Patch Set 17:

File be/src/exec/

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size());
move to c'tor

Line 124:   DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size());
nice, the separation is making the code a lot crisper.
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
it's sad that this still needs to be maintained. please apply my comments to the partitioined-*.{cc,h}
as well, where it makes sense

Line 80:   /// FunctionContexts objects are stored in ObjectPool of RuntimeState.
so what is agg_fn_pool_ for then?

Line 98:   boost::scoped_ptr<RowDescriptor> intermediate_row_desc_;
any reason this can't go in state->obj_pool()?

Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed
"for" is more idiomatic.

are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or "eval[s]" at the end?

Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs).
update comment
File be/src/exec/

Line 892:   AggFn::Close(analytic_fns_);
having an expr::close() and having to call that here is a bit counterintuitive (and manual,
since we need to do that for every expr). would be nice to move that to a utility function
in runtimestate (track the cache entries in some ancillary structure and then free them all
at once instead of piecemeal). that would reduce the likelihood of forgetting this somewhere.

not necessary to do it as part of this patch, but a todo would be nice.
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
initialize inline

Line 215:   ScalarExpr* order_by_eq_expr_;

Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
File be/src/exec/

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs));
why not just call create() directly here?
File be/src/exec/data-sink.h:

Line 87:   static Status Create(ObjectPool* pool, RuntimeState* state,
state already contains a pool

Line 123:   boost::scoped_ptr<MemPool> expr_mem_pool_;
do we really need a separate mempool for that?

Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
move in/out param to back
File be/src/exec/

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
why did we not need this before?
File be/src/exec/

Line 460:     if (!EvalConjunct(evaluators[i], row)) return false;
can't you just inline the call here?
File be/src/exec/exec-node.h:

Line 149:   static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* row);

a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate on its own
can't be a conjunct.

Line 151:   /// Evaluate the conjuncts in 'evaluators' over 'row'.
add convenience function for a vector?

Line 263:   /// object pool.

Line 290:   boost::scoped_ptr<MemPool> expr_mem_pool_;
what mem pool did we use for that before? in general, we want to get away from having mempools
all over the place
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool,
state contains pool

Line 103:       const FilterContext& from);
move const args to front

Line 114:   /// Codegen Eval() by codegen'ing the expression evaluations and replacing the
update comment

Line 117:   static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr,
does filter_expr get updated in this function?
File be/src/exec/hash-table.h:

Line 129:   void FreeBuildLocalAllocations();
File be/src/exec/

Line 249:     DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size());
this would benefit from a convenience function for a vector
File be/src/exec/hbase-table-writer.h:

Line 96:   /// The evaluators are owned by sink which owns this table writer.
"the sink"
File be/src/exec/hdfs-avro-table-writer.h:

Line 68:                       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
do these come from the sink? you're already passing in the sink.
File be/src/exec/hdfs-parquet-table-writer.h:

Line 57:       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
get those out of sink
File be/src/exec/hdfs-scan-node-base.h:

Line 387:   /// the per-scanner ScannerContext. Correspond with exprs in 'filter_exprs_'.
"Corresponds to"
File be/src/exec/

Line 122:   // Setup the scan node's dictionary filtering conjuncts map.
set up
File be/src/exec/hdfs-table-sink.h:

Line 166:   friend class DataSink;
or make everything that's private in datasink protected?

Line 168:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
param order

Line 202:       const std::vector<ScalarExprEvaluator*>& evaluators, std::string*
param order
File be/src/exec/

Line 431:   if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjunct_evaluators_.size(),
row)) {
uniformly switch to vector<>.data() instead of &vector<>[0], the latter is
undefined for empty vectors. jim is in the process of fixing all existing instances up.

Line 482:     if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjuncts_.size(),
row)) {

Line 675:   ht_ctx_.reset();

we want to stop tearing down control structures

Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* agg_expr_evaluator,
isn't this for the input expr?
File be/src/exec/partitioned-aggregation-node.h:

Line 448:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
why non-const?

Line 453:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
same here

Line 479:   void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, TupleRow* row,
the signatures don't conform to our standard. leave todo in class header

Line 491:   Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators,
missing std; why non-const?

there are several more non-const vector<>& here

Line 527:   /// Accessor for the expression contexts of an AggFnEvaluator. Returns an array
update comment

also, what is i?

Line 641:   /// Assumes is_merge = false;
update comment to describe params
File be/src/exec/

Line 97:   ScalarExprEvaluator* const* conjunct_evaluators = &conjunct_evaluators_[0];
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// TODO: Move the evaluation of constant input arguments to AggFn setup.
once that's done, can open() go away?
File be/src/runtime/descriptors.h:

Line 266:   const std::vector<ScalarExprEvaluator*>& partition_key_value_evaluators()
const {
value_evaluators feels redundant. partition_key_evls?

Line 290:   /// This evaluator is safe to be shared by all fragment instances as all expressions

Line 497:   static Status CreatePartKeyExprs(ObjectPool* pool, const HdfsTableDescriptor*
move pool to back
why const* instead of const&, can this really be null?
File be/src/runtime/

Line 187:   RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, fragment_ctx_,
runtime_state_ already has an obj_pool, remove first param
File be/src/runtime/

Line 276:   Status status = DescriptorTbl::Create(
single line?
File be/src/runtime/sorter.h:

Line 92:   /// 'sort_materialize_exprs' are the slot exprs used to materialize the tuples
to be
i find this harder to follow than with the original name. these are the exprs that materialize
the individual slots of the sort tuple, no? sort_tuple_input_exprs or something like that,
analogous to agg?

Line 96:   Sorter(TupleRowComparator& compare_less_than,
if you need non-const, then switch to *
File be/src/service/

Line 77: // 'type' indicates the type of the expression value.
it feels like this should live somewhere else (in /runtime/raw-value?)

Line 80:   if (value == nullptr) return;
dcheck col_val != null

Line 246:     exprs[i]->Close();
evl->root()->Close() and then iterate over collection
File be/src/util/

Line 51:   opened_ = false;
meaning you can call open() again, which in turns re-creates the evaluators? is that intentional?
or should the create() call move into an init() function?
File be/src/util/tuple-row-compare.h:

Line 99:         (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0],

Line 135:   bool opened_;
initialize inline, that way you don't need to change the c'tor if you ever move this around

Line 146:   std::vector<bool> is_asc_;
make const, while you're here

Line 147:   std::vector<int8_t> nulls_first_;

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message