impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Date Sat, 03 Jun 2017 23:53:04 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
......................................................................


Patch Set 17:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size());
move to c'tor


Line 124:   DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size());
nice, the separation is making the code a lot crisper.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
it's sad that this still needs to be maintained. please apply my comments to the partitioined-*.{cc,h}
as well, where it makes sense


Line 80:   /// FunctionContexts objects are stored in ObjectPool of RuntimeState.
so what is agg_fn_pool_ for then?


Line 98:   boost::scoped_ptr<RowDescriptor> intermediate_row_desc_;
any reason this can't go in state->obj_pool()?


Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed
code.
"for agg_fn_evaluators_.data()" is more idiomatic.

are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or "eval[s]" at the end?


Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs).
update comment


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 892:   AggFn::Close(analytic_fns_);
having an expr::close() and having to call that here is a bit counterintuitive (and manual,
since we need to do that for every expr). would be nice to move that to a utility function
in runtimestate (track the cache entries in some ancillary structure and then free them all
at once instead of piecemeal). that would reduce the likelihood of forgetting this somewhere.

not necessary to do it as part of this patch, but a todo would be nice.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
initialize inline


Line 215:   ScalarExpr* order_by_eq_expr_;
same


Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
same


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs));
why not just call create() directly here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 87:   static Status Create(ObjectPool* pool, RuntimeState* state,
state already contains a pool


Line 123:   boost::scoped_ptr<MemPool> expr_mem_pool_;
do we really need a separate mempool for that?


Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
move in/out param to back


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
why did we not need this before?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 460:     if (!EvalConjunct(evaluators[i], row)) return false;
can't you just inline the call here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 149:   static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* row);
evalpredicate

a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate on its own
can't be a conjunct.


Line 151:   /// Evaluate the conjuncts in 'evaluators' over 'row'.
add convenience function for a vector?


Line 263:   /// object pool.
why?


Line 290:   boost::scoped_ptr<MemPool> expr_mem_pool_;
what mem pool did we use for that before? in general, we want to get away from having mempools
all over the place


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool,
state contains pool


Line 103:       const FilterContext& from);
move const args to front


Line 114:   /// Codegen Eval() by codegen'ing the expression evaluations and replacing the
type
update comment


Line 117:   static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr,
does filter_expr get updated in this function?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 129:   void FreeBuildLocalAllocations();
FreeLocal<>Allocations()?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 249:     DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size());
this would benefit from a convenience function for a vector


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-table-writer.h
File be/src/exec/hbase-table-writer.h:

Line 96:   /// The evaluators are owned by sink which owns this table writer.
"the sink"


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-avro-table-writer.h
File be/src/exec/hdfs-avro-table-writer.h:

Line 68:                       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
do these come from the sink? you're already passing in the sink.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 57:       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
get those out of sink


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 387:   /// the per-scanner ScannerContext. Correspond with exprs in 'filter_exprs_'.
"Corresponds to"


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 122:   // Setup the scan node's dictionary filtering conjuncts map.
set up


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 166:   friend class DataSink;
or make everything that's private in datasink protected?


Line 168:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
param order


Line 202:       const std::vector<ScalarExprEvaluator*>& evaluators, std::string*
key);
param order


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 431:   if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjunct_evaluators_.size(),
row)) {
uniformly switch to vector<>.data() instead of &vector<>[0], the latter is
undefined for empty vectors. jim is in the process of fixing all existing instances up.


Line 482:     if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjuncts_.size(),
row)) {
same


Line 675:   ht_ctx_.reset();
why?

we want to stop tearing down control structures


Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* agg_expr_evaluator,
isn't this for the input expr?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 448:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
why non-const?


Line 453:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators,
same here


Line 479:   void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, TupleRow* row,
the signatures don't conform to our standard. leave todo in class header


Line 491:   Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators,
missing std; why non-const?

there are several more non-const vector<>& here


Line 527:   /// Accessor for the expression contexts of an AggFnEvaluator. Returns an array
of
update comment

also, what is i?


Line 641:   /// Assumes is_merge = false;
update comment to describe params


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/select-node.cc
File be/src/exec/select-node.cc:

Line 97:   ScalarExprEvaluator* const* conjunct_evaluators = &conjunct_evaluators_[0];
.data()


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// TODO: Move the evaluation of constant input arguments to AggFn setup.
once that's done, can open() go away?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 266:   const std::vector<ScalarExprEvaluator*>& partition_key_value_evaluators()
const {
value_evaluators feels redundant. partition_key_evls?


Line 290:   /// This evaluator is safe to be shared by all fragment instances as all expressions
plural


Line 497:   static Status CreatePartKeyExprs(ObjectPool* pool, const HdfsTableDescriptor*
hdfs_tbl)
move pool to back
why const* instead of const&, can this really be null?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

Line 187:   RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, fragment_ctx_,
runtime_state_ already has an obj_pool, remove first param


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

Line 276:   Status status = DescriptorTbl::Create(
single line?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 92:   /// 'sort_materialize_exprs' are the slot exprs used to materialize the tuples
to be
i find this harder to follow than with the original name. these are the exprs that materialize
the individual slots of the sort tuple, no? sort_tuple_input_exprs or something like that,
analogous to agg?


Line 96:   Sorter(TupleRowComparator& compare_less_than,
if you need non-const, then switch to *


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 77: // 'type' indicates the type of the expression value.
it feels like this should live somewhere else (in /runtime/raw-value?)


Line 80:   if (value == nullptr) return;
dcheck col_val != null


Line 246:     exprs[i]->Close();
evl->root()->Close() and then iterate over collection


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 51:   opened_ = false;
meaning you can call open() again, which in turns re-creates the evaluators? is that intentional?
or should the create() call move into an init() function?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 99:         (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0],
.data()


Line 135:   bool opened_;
initialize inline, that way you don't need to change the c'tor if you ever move this around


Line 146:   std::vector<bool> is_asc_;
make const, while you're here


Line 147:   std::vector<int8_t> nulls_first_;
same


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message