impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Date Wed, 24 May 2017 01:39:18 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
......................................................................


Patch Set 14:

(70 comments)

http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 93:     RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), state));
> build_expr is leaked if this returns.
Good catch. Fixed here and everywhere.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

> This all works out a lot cleaner - nice!
Done


Line 124:     RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i],
> I think there's a potential resource leak if the expr Init() function fails
Done


Line 166:   fn_pool_.reset(new MemPool(expr_mem_tracker()));;
> extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 181:   // TODO: codegen table sink
> I'm not sure what this means. Is it something to be fixed in this patch? Ot
I think this is IMPALA-4356.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 126:   /// Not used in DataStreamSender and PHJBuilder.
> Or NljBuilder. Maybe easier to just say that they're not used in all subcla
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS14, Line 451: inline
> Is the inline needed here?
The inline hint is for inlining into EvalConjuncts().


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 261:   /// Conjuncts and their evaluators in this node.
> Maybe mention who owns the pointers?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

PS14, Line 86: expr_evaluator->root()
> Is the plan to eventually pass around the Expr itself and avoid going via t
Really depends on the context. I imagine most cases we will use Expr directly. In some cases
when Expr is not directly available we may still go through the evaluator.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 120:   FilterContext() { }
> nit: can just remove this and rely on the default constructor
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 115:   /// computing the size of a scratch row.
> Maybe mention what 'pool' is for
Done


Line 415:   Status Init(ObjectPool* pool, RuntimeState* state, int num_build_tuples);
> What is pool used for?
Comments added.


Line 513:   MemPool* mem_pool_;
> Mention that it's not owned?
I suppose that's implied. Comments added.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS14, Line 294: 	  
> tab?
Done


Line 733:     vector<ScalarExprEvaluator*>& output_expr_evaluators)
> Can't this be a const& still? The vector isn't being modified is it?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

Line 26: #include "exprs/scalar-expr.h"
> Maybe not needed?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS14, Line 122:  DCHECK(conjuncts_map_[entry.first].empty());
Use find() instead.


Line 147:     // TODO: Move this to Prepare()
> Is this TODO in this patch, or does it depend on another change?
I am thinking of moving it when PlanNode is introduced.


Line 165:         new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false));
> This is a little inconsistent - above on line 121 we create the RowDescript
Good point. I think heap allocated may be safer in case the expression keeps a reference to
it.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 146:   const std::vector<ScalarExprEvaluator*> min_max_conjunct_evaluators() const
{
> Shouldn't this return a const&, rather than copying the vector?
Yes, right. Looks like the existing code is broken.


Line 393:   /// the per-scanner ScannerContext..
> Add something like "These correspond to the exprs in 'filter_exprs_'"
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 362:   MemPool filter_mem_pool(expr_mem_tracker());
> This is a little subtle - could do with a comment.
Done


Line 388:             for (auto& ctx: filter_ctxs) {
> Maybe use a goto and an exit block at the bottom of the function to enforce
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

Line 95:   boost::scoped_ptr<MemPool> expr_mem_pool_;
> Put it in 'obj_pool_'?
Appears to be more consistent with the pattern of hdfs-scanner to leave it as a scoped_ptr.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 164:     pool_->Add(build_expr);
> Maybe stick with the usual idiom of wrapping pool_->Add() around the alloca
Done


Line 694:   AggFnEvaluator::ShallowClone(parent->pool_, agg_fn_pool.get(),
> This should go into 'partition_pool_'. Otherwise if we have an Agg in a sub
Done


Line 1574:   // TODO: consider moving the following codegen logic to AggFn.
> This would make sense (not in this patch though).
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 109:     // TODO: Move to Prepare().
> Is this to do in this patch or are you planning to do it as part of a futur
In a future change.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 243:   /// Init() function inherited from DataSink. Overridden to be a no-op for now.
> This is a bit of a red flag to me - I think we should structure things so t
Init() is being called from DataSink::Create() for most DataSink types. Only the blocking-join
build sides are the exceptions right now as they aren't true data sinks of fragments yet.
Once they become true data sinks, they will also be initialized in DataSink::Create(). It
just happens that the function name "Init" conflicts with the existing Init() function for
PhjBuilder for now and I don't see a easy way to merge them. The original Init() function
is now renamed to InitExprsAndFilters(). Also added a TODO to merge them once PhjBuilder becomes
a true data sink.


Line 381:   /// List of filters to build.
> Mention that there's a correspondence with 'filter_exprs_'?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

PS14, Line 95: nd
> and
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 84: AggFnEvaluator::~AggFnEvaluator() {
> Can we add a DCHECK() that it was closed?
Done


PS14, Line 87: inline
> probably not needed
Done


PS14, Line 91: inline
> probably not needed
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 250
> Consider leaving these as inline functions in the header - AnalyticEvalNode
Done


Line 72:   /// Convenience functions for creating evaluators for multiple aggregate functions.
> Mention that the caller is responsible for closing the created evaluators r
Done


Line 174:   /// True if this evaluator is created from a Clone() call.
> ShallowClone()
Done


PS14, Line 175: is_clone_
> Maybe is_shallow_clone_ to be more precise - although there is only one typ
Keep as-is.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 87:   RETURN_IF_ERROR(LibCache::instance()->GetSoFunctionPtr(fn_.hdfs_location,
> Nit: I don't think the remainder of this function needs all the vertical wh
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.h
File be/src/exprs/agg-fn.h:

PS14, Line 52: state
> value?. "state" could be taken to mean that it initialises the evaluator or
Done


Line 81:   /// TODO: The aggregation node has custom codegen paths for a few of the builtins.
> Consider removing the TODO - it's a bit vague and unclear what if anything 
It corresponds to the TODO in the CodegenUpdateSlot() in PartitionedAggregationNode.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 53: /// --- Expr overview
> Nice overview
Done


Line 81:   static Status CreateTree(const TExpr& texpr, ObjectPool* pool, Expr* root);
> I think this should be protected since it's only called from ScalarExpr and
Done


Line 94:   bool is_constant() const { return is_constant_; }
> I think is_constant() should maybe only be defined in the ScalarExpr sub-hi
Done


Line 100:   /// Simple debug string that provides no expr subclass-specific information.
> Comment seems stale since there's no implementation.
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/hive-udf-call.h
File be/src/exprs/hive-udf-call.h:

Line 28: using namespace impala_udf;
> I guess some of these 'using namespace' instances were pre-existing - still
Switched to "using impala_udf::*" hidden inside impala namespace. Apparently, it's not legal
to have random using declaration inside a class declaration.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/literal.h
File be/src/exprs/literal.h:

Line 28: using namespace impala_udf;
> See other comment about using namespace in headers.
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 154:   // Owner of mem_pool_ needs to call FreeAll() on it.
> We don't really know what the owner wants to do. Maybe something like "Memo
Done


Line 436:   // TODO: is there a better way to do this?
> Consider removing TODO - I don't think we're likely to fix it.
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 29: using namespace impala_udf;
> I'm not sure if this works, but maybe if we just move this "using" into the
Switch to "using impala_udf::*" inside impala namespace.


PS14, Line 43: types
> type
Done


Line 122:   void* GetValue(const ScalarExpr& e, const TupleRow* row);
> Maybe GetSubExprValue()? I don't feel strongly but might make the intent cl
This is used internally by the other GetValue() so it's not strictly used for sub-expression
evaluation only.


Line 144:   Status GetError(int start_idx = 0, int end_idx = -1) const WARN_UNUSED_RESULT;
> Arguments maybe need explanation.
Done


Line 147:   void PrintValue(const TupleRow* row, std::string* str);
> Does this evaluate the expr?
Yes. Comments updated.


Line 153:   /// The last two are helper functions.
> Not sure which "last two" refers to.
Comment fixed.


Line 157:   /// Frees all local allocations made by fn_ctxs_. This can be called when result
> Thinking out loud about future work: I think to fix memory transfer I'll ne
Yes, it'd work out but not sure the difference in overhead for not reusing the free pool buffer.
It's relying on the underlying buffer pool to recycle the buffers, right ? Is that something
supported in the new buffer pool ?


PS14, Line 158:  last two
> Same here - maybe out of sync with the code?
Done


Line 183:   /// Users of private GetValue() or 'pool_'.
> It might make sense to move the methods that the "friend class" is mean to 
Done


PS14, Line 204: the evaluation
> maybe something like "Stores the result of the last evaluation of the the r
Comment fixed. We may use this for sub-expression evaluation too.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

PS14, Line 398: subclassed
> I think this should be "overridden"
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 164:   bool is_interpreted = scalar_fn_wrapper_ == nullptr;
> Thank you! This is much clearer.
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/slot-ref.h
File be/src/exprs/slot-ref.h:

Line 24: using namespace impala_udf;
> I think generally we try to avoid using entire namespaces in the headers. M
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 49: /// TODO: Move this to the equivalent of PlanNode class for DataSink.
> Is there a JIRA for this?
This is essentially the next step for IMPALA-4192. The comment is not quite right. It's fixed
in the new patch.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 486:       DCHECK(partition_expr->IsLiteral());
> Can you add a comment in CatalogObjects.thrift that these are always litera
Done


Line 491:     // literals Partition exprs are not used in the codegen case. Don't codegen
> Missing .
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 93:   const TRuntimeFilterDesc& filter_desc_;
> Comment what owns the filter?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 105:   Status Init(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool);
> Comment what the arguments are for? The names could maybe be more descripti
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

PS14, Line 194: mem_pool
> expr_mem_pool?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/udf/udf.h
File be/src/udf/udf.h:

Line 111:     /// TODO: Move FRAGMENT_LOCAL states to query_state for multi-threading.
> Do we have a JIRA trackign this?
IMPALA-5356.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 82:   Status Open(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool);
> Similar to my comment in Sorter - it's unclear what these different argumen
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message