impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Date Mon, 05 Jun 2017 20:30:53 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
......................................................................


Patch Set 16:

(61 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size());
> move to c'tor
Done


Line 124:       &agg_fn_evaluators_));
> nice, the separation is making the code a lot crisper.
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 92:         new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN);
> Tab?
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
> it's sad that this still needs to be maintained. please apply my comments t
Done


Line 80:   /// FunctionContexts objects are stored in ObjectPool of RuntimeState.
> so what is agg_fn_pool_ for then?
Stale comment. It's the backing MemPools of 'agg_fn_evaluators_'.


Line 98:   boost::scoped_ptr<RowDescriptor> intermediate_row_desc_;
> any reason this can't go in state->obj_pool()?
Done


Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed
code.
> "for agg_fn_evaluators_.data()" is more idiomatic.
Done


Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs).
> update comment
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 892:   AggFn::Close(analytic_fns_);
> having an expr::close() and having to call that here is a bit counterintuit
Expr::Close() will be called in PlanNode::Close() in the future. This is done here as a transitionary
step.

Also, I don't see Expr or evaluators any different from other data structures in the exec
node. You can extend the argument to any structures you have in an exec node and argue that
you can add them to a container which automatically call Close() on them. I don't see the
need to create a special case for Expr.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
> initialize inline
Done


Line 215:   ScalarExpr* order_by_eq_expr_;
> same
Done


Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
> same
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs));
> why not just call create() directly here?
Not all data_sink have output_exprs. Some of them have partition_exprs only.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 123:   boost::scoped_ptr<MemPool> expr_mem_pool_;
> do we really need a separate mempool for that?
What other MemPool can we use then ?


Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
> move in/out param to back
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
> why did we not need this before?
This was not needed before as the exchange node owns the evaluators before and was added to
vector of evaluators to free periodically. The evaluator is not stored in the tuple row comparator.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 460:     if (!EvalConjunct(evaluators[i], row)) return false;
> can't you just inline the call here?
Not sure I understand the comment. EvalConjunct() is already inlined.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 263:   /// object pool.
> why?
evaluators are execution related stuff so they are stored in the exec nodes' obj_pool (similar
to other stuff which live in the obj_pool_).

exprs are fragment level entity so it makes sense for it to live in fragment level's obj_pool.


Line 290:   boost::scoped_ptr<MemPool> expr_mem_pool_;
> what mem pool did we use for that before? in general, we want to get away f
We were passing the expr_mem_tracker directly into the ExprContext so there is one MemPool
per ExprContext before. Now all exprs in an exec node share the expr_mem_pool of the exec
node.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool,
> state contains pool
Caller can pass in pool with shorter life span than the entire query (e.g. scanner thread).


Line 103:       const FilterContext& from);
> move const args to front
Done


Line 114:   /// Codegen Eval() by codegen'ing the expression evaluations and replacing the
type
> update comment
Done


Line 117:   static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr,
> does filter_expr get updated in this function?
Yes. Its field ir_compute_fn_ may be populated.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 129:   void FreeBuildLocalAllocations();
> FreeLocal<>Allocations()?
"LocalAllocations" is a specific terminology we refer to at other places in our code base
so not a good idea to split it up.

One can look at the names as BuildSide's / ProbeSide's LocalAllocations().


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 249:     DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size());
> this would benefit from a convenience function for a vector
You probably still need to pass the expr and its evaluator as argument so there's not much
saving.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-table-writer.h
File be/src/exec/hbase-table-writer.h:

Line 96:   /// The evaluators are owned by sink which owns this table writer.
> "the sink"
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-avro-table-writer.h
File be/src/exec/hdfs-avro-table-writer.h:

Line 68:                       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
> do these come from the sink? you're already passing in the sink.
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 57:       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
> get those out of sink
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 387:   /// the per-scanner ScannerContext. Correspond with exprs in 'filter_exprs_'.
> "Corresponds to"
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 122:   // Setup the scan node's dictionary filtering conjuncts map.
> set up
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 166:   friend class DataSink;
> or make everything that's private in datasink protected?
friend class removed. Not needed.


Line 168:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
> param order
Done


Line 202:       const std::vector<ScalarExprEvaluator*>& evaluators, std::string*
key);
> param order
The first two params are constant params. Not sure what you meant.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 431:   if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjunct_evaluators_.size(),
row)) {
> uniformly switch to vector<>.data() instead of &vector<>[0], the latter
is 
Done


Line 482:     if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjuncts_.size(),
row)) {
> same
Done


Line 675:   ht_ctx_.reset();
> why?
To remove any possible reference to it. We already called ht_ctx_->Close(state) the line
before.


Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* agg_expr_evaluator,
> isn't this for the input expr?
Stale comment. Comment updated.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 448:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
> why non-const?
Done


Line 453:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
> same here
Done


Line 479:   void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, TupleRow* row,
> the signatures don't conform to our standard. leave todo in class header
In what way ?


Line 491:   Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators,
> missing std; why non-const?
Done


Line 527:   /// Accessor for the expression contexts of an AggFnEvaluator. Returns an array
of
> update comment
Stale declaration. Removed.


Line 641:   /// Assumes is_merge = false;
> update comment to describe params
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/select-node.cc
File be/src/exec/select-node.cc:

Line 97:   ScalarExprEvaluator* const* conjunct_evaluators = &conjunct_evaluators_[0];
> .data()
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// TODO: Move the evaluation of constant input arguments to AggFn setup.
> once that's done, can open() go away?
Not really as we still have to open the evaluator for the input expressions which involve
allocating the function context etc


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 96: }
> I was even thinking it might be less error-prone if, on failure, this close
I believe the caller has to call ExecNode::Close() anyway to free other resources so I don't
see this being more error-prone than other existing control structures in ExecNode.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 266:   const std::vector<ScalarExprEvaluator*>& partition_key_value_evaluators()
const {
> value_evaluators feels redundant. partition_key_evls?
The function name seems fine to me. I renamed it partition_key_value_evals().


Line 290:   /// This evaluator is safe to be shared by all fragment instances as all expressions
> plural
Done


Line 497:   static Status CreatePartKeyExprs(ObjectPool* pool, const HdfsTableDescriptor*
hdfs_tbl)
> move pool to back
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

Line 187:   RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, fragment_ctx_,
> runtime_state_ already has an obj_pool, remove first param
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

Line 276:   Status status = DescriptorTbl::Create(
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 92:   /// 'sort_materialize_exprs' are the slot exprs used to materialize the tuples
to be
> i find this harder to follow than with the original name. these are the exp
Renamed to sort_tuple_exprs


Line 96:   Sorter(TupleRowComparator& compare_less_than,
> if you need non-const, then switch to *
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 77: // 'type' indicates the type of the expression value.
> it feels like this should live somewhere else (in /runtime/raw-value?)
This is a static function used only in FE requests. Don't see why it needs to be moved.


Line 80:   if (value == nullptr) return;
> dcheck col_val != null
Done


Line 246:     exprs[i]->Close();
> evl->root()->Close() and then iterate over collection
This would require a const cast which I try to avoid as much as possible. eval->root()
return const ScalarExpr&


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 51:   opened_ = false;
> meaning you can call open() again, which in turns re-creates the evaluators
The problem is that the ordering_expr_evaluators_rhs_ is cloned from ordering_expr_evaluators_lhs_
which cannot be done after Open() is called anyway. I suppose it's an okay exception to call
ScalarExprEvaluator::Open() in Init() once and not call it again when the exec node is re-opened.
We currently call ScalarExprEvaluator::Open() on the re-open of an Exec node in a subplan
but I don't really see the point of it as the initialization is skipped after the first time
anyway.

I renamed Open() to Init() and removed opened_ and updated the callers to call Init() once.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 99:         (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0],
> .data()
Done


Line 135:   bool opened_;
> initialize inline, that way you don't need to change the c'tor if you ever 
Done


Line 146:   std::vector<bool> is_asc_;
> make const, while you're here
Done


Line 147:   std::vector<int8_t> nulls_first_;
> same
This one cannot be constant due to the way it's initialized in the constructor.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message