calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Chan <>
Subject Re: [DISCUSS] Proposal to add API to force rules matching specific rels
Date Thu, 31 Oct 2019 09:21:32 GMT
Thanks Vladimir for bringing up this discussion !

Did you notice that there is a JIRA issue about this problem ? [1] Also a discussion about
how to propagate the traits [2]


Danny Chan
在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov <>,写道:
> Hi colleagues,
> I would like to discuss with the community the possibility of adding a new
> public method to VolcanoPlanner which will forcefully re-trigger the rules
> for the specific rel. This is a follow up of a discussion started in the
> other thread [1].
> **Problem statement**
> When converting between conventions during optimization VolcanoPlanner
> prefers the top-bottom approach, so that the nodes are converted from the
> root. But in some cases, the intermediate node must be converted after its
> children. This is especially true for distributed SQL engines, which rely
> on distribution traits during the optimization process: it is not possible
> to efficiently choose a proper physical implementation of a parent node
> unless the physical representation of a child node is known.
> It seems that presently VolcanoPlanner cannot address such cases by
> default. Consider that we have two nodes and associated rules which convert
> them to physical counterparts:
> [LogicalParent <- LogicalChild]
> The parent should be converted after the child. When the child is
> converted, the physical node is created:
> [LogicalParent <- {LogicalChild, PhysicalChild}]
> In order to finish the optimization process, we need to convert the parent.
> But parent rules are not fired, because PhysicalChild has traits
> incompatible with LogicalParent.
> Presently the problem could be solved in two ways:
> 1) Always produce conversions when going top-down. In this case, I first
> create a physical parent, then a physical child. The problem is that
> created parent is not optimal because it didn't know child distribution at
> the time of creation. So the full flow would be: create a bad parent,
> create a good child, create a good parent.
> 1.1) [LogicalParent <- LogicalChild]
> 1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
> 1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild, PhysicalChild}]
> 1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
> {LogicalChild, PhysicalChild}]
> What is worse, the creation of a not optimal parent will trigger rules on
> its parent, which in turn may create a not optimal parent-parent node, etc.
> 2) Make sure that your convention returns true for
> Convention.canConvertConvention. In this case, every time a new rel is
> added to a RelSet, its traitset will be compared to traitsets of all other
> rels in the set. For every pair of traitset we may ask the engine to create
> a relevant AbstractConverter. The net effect is that "physical-to-logical"
> converter is created, which re-triggers the rule on the logical parent
> since their conventions are compatible:
> 2.1) [LogicalParent <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [LogicalParent <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
> 2.4) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
> The problem with that approach is that it is too coarse-grained since we
> operate on traitsets rather than rels. As a result, additional memory and
> CPU pressure are introduced because usually too many converters are
> created, which triggers the rules over and over again.
> **Affected products**
> At the moment two distributed engines are being developed for Hazelcast and
> Apache Ignite. Both require bottom-up optimization and currently rely on
> the second workaround.
> Another example is Apache Drill. I do not know whether its community is
> concerned with the issue, but it also uses bottom-up optimization for many
> rules and employs both the aforementioned workarounds. As a result, I guess
> that Drill's optimizer also creates too many rels during optimization and
> suffer from huge search space. Please correct me if I am wrong.
> **Proposal**
> The key problem is that there is no way to re-fire rules on a specific
> node. The original problem could have been solved, if it would be possible
> to re-fire rules on a LogicalParent without creating any additional rels.
> That would lead to a clear optimization flow:
> 2.1) [LogicalParent <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild}]
> We can add the following method to VolcanoPlanner (RelOptPlanner?)
> interface:
> void fireRules(RelNode rel)
> This method will fire the rules on a passed node in a deferred mode as if
> it was the new node just added to the optimizer. This would require slight
> changes to RuleQueue.addMatch method, and possibly some other places.
> Usage example:
> class PhysicalChildRule extends RelOptRule {
> void onMatch(RelOptRuleCall call) {
> LogicalChild logicalRel = call.get(0);
> PhysicalChild physicalRel = ...;
> call.transformTo(physicalRel);
> optimizer.fireRules(logicalRel);
> }
> }
> What does the community think about such a method? Does it make any sense
> to you? If not, do you aware of any other ways on how to organize bottom-up
> optimization with VolcanoPlanner without the creation of additional rels?
> If the community is OK in general, I can create try to create a PR with a
> prototype.
> Would appreciate your feedback.
> Regards,
> Vladimir.
> [1]

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message