impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Date Thu, 17 Nov 2016 01:03:54 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 952:     RETURN_IF_ERROR(OptimizeModule());
> Is there a reason why we don't call DestroyIRModule() when OptimizeModule()
No reason, it's inconsistent. I don't think it's really necessary to destroy the IR module
on the error path, since it will be cleaned up shortly anyway, so I removed it on those paths
to simplify things.


PS4, Line 986:  memory_manager_->bytes_allocated()),
             :         memory_manager_->bytes_allocated()
> The code may be easier to read if this factored into a single variable inst
Done


Line 1039:   if (!mem_tracker_->TryConsume(estimated_memory)) {
> I wonder if it's better to skip optimization if we cannot reserve the memor
Compiling the code can also be pretty CPU/memory intensive so it probably isn't even safe
to do that. I think the codegen memory will only be greater than the runtime memory requirement
in some pretty extreme cases. E.g. in the queries I looked at the memory calculation was at
most a few MB per fragment, and usually less.


PS4, Line 1041:         Substitute("Codegen failed to reserve '$0' bytes for optimization",
              :                                               estimated_memory),
              :         estimated_memory);
> These line wraps look a bit weird to me but not sure if it's a side effect 
I agree they look weird but clang-format really wants to format it like this.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 708: optimiser
> nit: optimizer
We generally use a mix of American and commonwealth English spellings in comments. I guess
here we're inconsistent internally so I just changed it.


PS4, Line 709:  512 bytes
> Mind documenting how this is derived so we can update it accordingly in cas
Elaborated a bit. It's very approximate (and will become very inaccurate once functions get
large).


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/mcjit-mem-mgr.h
File be/src/codegen/mcjit-mem-mgr.h:

PS4, Line 36: memory
> This is for tracking memory consumed by the compiled machine code, right ?
Reworded to clarify.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS4, Line 187:  false
> true ?
Done


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS4, Line 762:     if (ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) {
             :       // Avoid bloating function by inlining too many exprs into it.
             :       expr_fn->addFnAttr(llvm::Attribute::NoInline);
             :     }
> I wonder if this can be more generalized by putting it in LlvmCodegen::Fina
Not a bad idea, but doesn't address the main scenario this patch helps with, which is when
many small functions are inlined into a large function.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> This seems a bit adhoc to me. May be better to track the size of IR functio
I'd just be picking an arbitrary threshold in that case too, since I don't really have a good
way to accurately estimate the cost. The advantage of doing it this way is that it's easier
to understand why it was disabled and gives a user an actionable way to work around it.


PS4, Line 190:  const
> constexpr
There isn't a semantic difference between static const and static constexpr when it's an integer
literal. http://en.cppreference.com/w/cpp/language/constant_expression


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS4, Line 1911: 
              : 
> Is it necessary to remove it ? For comparison, please see FirstValUpdate().
I added it back in. It shouldn't be necessary to instantiate it, but this makes it more consistent.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS4, Line 311: CODEGEN_INLINE_EXPRS_THRESHOLD
> Wouldn't instruction count be a more precise estimation ?
It's a pretty arbitrary threshold anyway (afaik optimisation time is only loosely correlated
with instruction count) so I like that this is a little simpler and easier to understand.


PS4, Line 311: const
> constexpr ?
I don't feel strongly but I don't think there's any advantage to constexpr when it's a simple
literal.


http://gerrit.cloudera.org:8080/#/c/4956/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 494:   24: required bool disable_codegen
> please move up, into the common/non-union part of this struct (field 8?) an
Done


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> Sure. This seems to set things up so the planner can selectively disable co
I disable it for this particular exchange. This is a proactive thing only right? Since we
don't actually codegen non-merging exchanges.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message