impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Date Wed, 24 May 2017 01:39:18 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext

Patch Set 14:

File be/src/exec/

Line 93:     RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), state));
> build_expr is leaked if this returns.
Good catch. Fixed here and everywhere.
File be/src/exec/

> This all works out a lot cleaner - nice!

Line 124:     RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i],
> I think there's a potential resource leak if the expr Init() function fails

Line 166:   fn_pool_.reset(new MemPool(expr_mem_tracker()));;
> extra semicolon
File be/src/exec/

Line 181:   // TODO: codegen table sink
> I'm not sure what this means. Is it something to be fixed in this patch? Ot
I think this is IMPALA-4356.
File be/src/exec/data-sink.h:

Line 126:   /// Not used in DataStreamSender and PHJBuilder.
> Or NljBuilder. Maybe easier to just say that they're not used in all subcla
File be/src/exec/

PS14, Line 451: inline
> Is the inline needed here?
The inline hint is for inlining into EvalConjuncts().
File be/src/exec/exec-node.h:

Line 261:   /// Conjuncts and their evaluators in this node.
> Maybe mention who owns the pointers?
File be/src/exec/

PS14, Line 86: expr_evaluator->root()
> Is the plan to eventually pass around the Expr itself and avoid going via t
Really depends on the context. I imagine most cases we will use Expr directly. In some cases
when Expr is not directly available we may still go through the evaluator.
File be/src/exec/filter-context.h:

Line 120:   FilterContext() { }
> nit: can just remove this and rely on the default constructor
File be/src/exec/hash-table.h:

Line 115:   /// computing the size of a scratch row.
> Maybe mention what 'pool' is for

Line 415:   Status Init(ObjectPool* pool, RuntimeState* state, int num_build_tuples);
> What is pool used for?
Comments added.

Line 513:   MemPool* mem_pool_;
> Mention that it's not owned?
I suppose that's implied. Comments added.
File be/src/exec/

PS14, Line 294: 	  
> tab?

Line 733:     vector<ScalarExprEvaluator*>& output_expr_evaluators)
> Can't this be a const& still? The vector isn't being modified is it?
File be/src/exec/

Line 26: #include "exprs/scalar-expr.h"
> Maybe not needed?
File be/src/exec/

PS14, Line 122:  DCHECK(conjuncts_map_[entry.first].empty());
Use find() instead.

Line 147:     // TODO: Move this to Prepare()
> Is this TODO in this patch, or does it depend on another change?
I am thinking of moving it when PlanNode is introduced.

Line 165:         new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false));
> This is a little inconsistent - above on line 121 we create the RowDescript
Good point. I think heap allocated may be safer in case the expression keeps a reference to
File be/src/exec/hdfs-scan-node-base.h:

Line 146:   const std::vector<ScalarExprEvaluator*> min_max_conjunct_evaluators() const
> Shouldn't this return a const&, rather than copying the vector?
Yes, right. Looks like the existing code is broken.

Line 393:   /// the per-scanner ScannerContext..
> Add something like "These correspond to the exprs in 'filter_exprs_'"
File be/src/exec/

Line 362:   MemPool filter_mem_pool(expr_mem_tracker());
> This is a little subtle - could do with a comment.

Line 388:             for (auto& ctx: filter_ctxs) {
> Maybe use a goto and an exit block at the bottom of the function to enforce
File be/src/exec/kudu-scanner.h:

Line 95:   boost::scoped_ptr<MemPool> expr_mem_pool_;
> Put it in 'obj_pool_'?
Appears to be more consistent with the pattern of hdfs-scanner to leave it as a scoped_ptr.
File be/src/exec/

Line 164:     pool_->Add(build_expr);
> Maybe stick with the usual idiom of wrapping pool_->Add() around the alloca

Line 694:   AggFnEvaluator::ShallowClone(parent->pool_, agg_fn_pool.get(),
> This should go into 'partition_pool_'. Otherwise if we have an Agg in a sub

Line 1574:   // TODO: consider moving the following codegen logic to AggFn.
> This would make sense (not in this patch though).
File be/src/exec/

Line 109:     // TODO: Move to Prepare().
> Is this to do in this patch or are you planning to do it as part of a futur
In a future change.
File be/src/exec/partitioned-hash-join-builder.h:

Line 243:   /// Init() function inherited from DataSink. Overridden to be a no-op for now.
> This is a bit of a red flag to me - I think we should structure things so t
Init() is being called from DataSink::Create() for most DataSink types. Only the blocking-join
build sides are the exceptions right now as they aren't true data sinks of fragments yet.
Once they become true data sinks, they will also be initialized in DataSink::Create(). It
just happens that the function name "Init" conflicts with the existing Init() function for
PhjBuilder for now and I don't see a easy way to merge them. The original Init() function
is now renamed to InitExprsAndFilters(). Also added a TODO to merge them once PhjBuilder becomes
a true data sink.

Line 381:   /// List of filters to build.
> Mention that there's a correspondence with 'filter_exprs_'?
File be/src/exec/

PS14, Line 95: nd
> and
File be/src/exprs/

Line 84: AggFnEvaluator::~AggFnEvaluator() {
> Can we add a DCHECK() that it was closed?

PS14, Line 87: inline
> probably not needed

PS14, Line 91: inline
> probably not needed
File be/src/exprs/agg-fn-evaluator.h:

Line 250
> Consider leaving these as inline functions in the header - AnalyticEvalNode

Line 72:   /// Convenience functions for creating evaluators for multiple aggregate functions.
> Mention that the caller is responsible for closing the created evaluators r

Line 174:   /// True if this evaluator is created from a Clone() call.
> ShallowClone()

PS14, Line 175: is_clone_
> Maybe is_shallow_clone_ to be more precise - although there is only one typ
Keep as-is.
File be/src/exprs/

Line 87:   RETURN_IF_ERROR(LibCache::instance()->GetSoFunctionPtr(fn_.hdfs_location,
> Nit: I don't think the remainder of this function needs all the vertical wh
File be/src/exprs/agg-fn.h:

PS14, Line 52: state
> value?. "state" could be taken to mean that it initialises the evaluator or

Line 81:   /// TODO: The aggregation node has custom codegen paths for a few of the builtins.
> Consider removing the TODO - it's a bit vague and unclear what if anything 
It corresponds to the TODO in the CodegenUpdateSlot() in PartitionedAggregationNode.
File be/src/exprs/expr.h:

Line 53: /// --- Expr overview
> Nice overview

Line 81:   static Status CreateTree(const TExpr& texpr, ObjectPool* pool, Expr* root);
> I think this should be protected since it's only called from ScalarExpr and

Line 94:   bool is_constant() const { return is_constant_; }
> I think is_constant() should maybe only be defined in the ScalarExpr sub-hi

Line 100:   /// Simple debug string that provides no expr subclass-specific information.
> Comment seems stale since there's no implementation.
File be/src/exprs/hive-udf-call.h:

Line 28: using namespace impala_udf;
> I guess some of these 'using namespace' instances were pre-existing - still
Switched to "using impala_udf::*" hidden inside impala namespace. Apparently, it's not legal
to have random using declaration inside a class declaration.
File be/src/exprs/literal.h:

Line 28: using namespace impala_udf;
> See other comment about using namespace in headers.
File be/src/exprs/

Line 154:   // Owner of mem_pool_ needs to call FreeAll() on it.
> We don't really know what the owner wants to do. Maybe something like "Memo

Line 436:   // TODO: is there a better way to do this?
> Consider removing TODO - I don't think we're likely to fix it.
File be/src/exprs/scalar-expr-evaluator.h:

Line 29: using namespace impala_udf;
> I'm not sure if this works, but maybe if we just move this "using" into the
Switch to "using impala_udf::*" inside impala namespace.

PS14, Line 43: types
> type

Line 122:   void* GetValue(const ScalarExpr& e, const TupleRow* row);
> Maybe GetSubExprValue()? I don't feel strongly but might make the intent cl
This is used internally by the other GetValue() so it's not strictly used for sub-expression
evaluation only.

Line 144:   Status GetError(int start_idx = 0, int end_idx = -1) const WARN_UNUSED_RESULT;
> Arguments maybe need explanation.

Line 147:   void PrintValue(const TupleRow* row, std::string* str);
> Does this evaluate the expr?
Yes. Comments updated.

Line 153:   /// The last two are helper functions.
> Not sure which "last two" refers to.
Comment fixed.

Line 157:   /// Frees all local allocations made by fn_ctxs_. This can be called when result
> Thinking out loud about future work: I think to fix memory transfer I'll ne
Yes, it'd work out but not sure the difference in overhead for not reusing the free pool buffer.
It's relying on the underlying buffer pool to recycle the buffers, right ? Is that something
supported in the new buffer pool ?

PS14, Line 158:  last two
> Same here - maybe out of sync with the code?

Line 183:   /// Users of private GetValue() or 'pool_'.
> It might make sense to move the methods that the "friend class" is mean to 

PS14, Line 204: the evaluation
> maybe something like "Stores the result of the last evaluation of the the r
Comment fixed. We may use this for sub-expression evaluation too.
File be/src/exprs/

PS14, Line 398: subclassed
> I think this should be "overridden"
File be/src/exprs/

Line 164:   bool is_interpreted = scalar_fn_wrapper_ == nullptr;
> Thank you! This is much clearer.
File be/src/exprs/slot-ref.h:

Line 24: using namespace impala_udf;
> I think generally we try to avoid using entire namespaces in the headers. M
File be/src/runtime/data-stream-sender.h:

Line 49: /// TODO: Move this to the equivalent of PlanNode class for DataSink.
> Is there a JIRA for this?
This is essentially the next step for IMPALA-4192. The comment is not quite right. It's fixed
in the new patch.
File be/src/runtime/

Line 486:       DCHECK(partition_expr->IsLiteral());
> Can you add a comment in CatalogObjects.thrift that these are always litera

Line 491:     // literals Partition exprs are not used in the codegen case. Don't codegen
> Missing .
File be/src/runtime/runtime-filter.h:

Line 93:   const TRuntimeFilterDesc& filter_desc_;
> Comment what owns the filter?
File be/src/runtime/sorter.h:

Line 105:   Status Init(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool);
> Comment what the arguments are for? The names could maybe be more descripti
File be/src/service/

PS14, Line 194: mem_pool
> expr_mem_pool?
File be/src/udf/udf.h:

Line 111:     /// TODO: Move FRAGMENT_LOCAL states to query_state for multi-threading.
> Do we have a JIRA trackign this?
File be/src/util/tuple-row-compare.h:

Line 82:   Status Open(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool);
> Similar to my comment in Sorter - it's unclear what these different argumen

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message