flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-3609] [tableAPI] Reorganize selection o...
Date Wed, 16 Mar 2016 13:10:02 GMT
Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/rules/FlinkRuleSets.scala
    @@ -29,50 +29,45 @@ object FlinkRuleSets {
       val DATASET_OPT_RULES: RuleSet = RuleSets.ofList(
    -    // filter rules
    +    // push a filter into a join
    +    // push filter into the children of a join
    -    FilterMergeRule.INSTANCE,
    -    FilterAggregateTransposeRule.INSTANCE,
    +    // push filter through an aggregation
    +    FlinkFilterAggregateTransposeRule.INSTANCE,
    -    // push and merge projection rules
    +    // aggregation and projection rules
    -    ProjectMergeRule.INSTANCE,
    +    AggregateProjectPullUpConstantsRule.INSTANCE,
    +    // push a projection past a filter or vice versa
    -    AggregateProjectPullUpConstantsRule.INSTANCE,
    -    JoinPushExpressionsRule.INSTANCE,
    +    // push a projection to the children of a join
    +    // remove identity project
    +    // reorder sort and projection
    -    // merge and push unions rules
    -    // TODO: Add a rule to enforce binary unions
    +    // join rules
    +    JoinPushExpressionsRule.INSTANCE,
    +    // remove union with only a single child
    -    FlinkJoinUnionTransposeRule.LEFT_UNION,
    -    FlinkJoinUnionTransposeRule.RIGHT_UNION,
    --- End diff --
    These rules would push a join into a union. This means that a join is executed on each
input of the union and the join results are unioned afterwards. Additional joins are likely
to cause higher resource consumption. For example if the unioned inputs are fed into the probe-side
of a join, the build side would need to be replicated. 
    Unless we have very good statistics and a more fine-grained cost model, I would not try
to optimize and trust the user (at least for the Table API).

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message