flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "godfrey he (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-5859) support partition pruning on Table API & SQL
Date Mon, 27 Feb 2017 08:31:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098
] 

godfrey he edited comment on FLINK-5859 at 2/27/17 8:31 AM:
------------------------------------------------------------

Hi, [~fhueske], Thanks for you advice. 

IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}}, {{PushFilterIntoBatchTableSourceScanRule}},
{{PartitionPruningRule}} (maybe, we integrate it in PushFilterIntoBatchTableSourceScanRule)
and so on need be applied only once and do not need cost model actually. And Rules including
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on 
do not need real cost, dummy cost is enough. Rules including {{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}}
and so on need be applied with real cost. So we want to break the optimization phase down
into 3 phases later. The whole optimization includes 5 steps: 

# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost (rules include {{FilterCalcMergeRule}},
{{FilterJoinRule}}, {{DataSetCalcRule}} and so on)
# optimize the physical plan with HEP planner (rules include {{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}} and so on)
# optimize the physical plan with Volcano planner and real cost (rules include {{LoptOptimizeJoinRule}},
{{JoinToMultiJoinRule}} and so on)

At that time, each optimization phase keeps the complexity as small as possible. And your
concern can be eliminated also. 

Looking forward to your advice, thanks.


was (Author: godfreyhe):
Hi, [~fhueske], Thanks for you advice. 

IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}}, {{PushFilterIntoBatchTableSourceScanRule}},
{{PartitionPruningRule}} (maybe, we integrate it in PushFilterIntoBatchTableSourceScanRule)
and so on need be applied only once and do not need cost model actually. And Rules including
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on 
do not need real cost, dummy cost is enough. Rules including {{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}}
and so on need to be applied with real cost. So we want to break the optimization phase down
into 3 phases later. The whole optimization includes 5 steps: 

# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost (rules include {{FilterCalcMergeRule}},
{{FilterJoinRule}}, {{DataSetCalcRule}} and so on)
# optimize the physical plan with HEP planner (rules include {{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}} and so on)
# optimize the physical plan with Volcano planner and real cost (rules include {{LoptOptimizeJoinRule}},
{{JoinToMultiJoinRule}} and so on)

At that time, each optimization phase keeps the complexity as small as possible. And your
concern can be eliminated also. 

Looking forward to your advice, thanks.

> support partition pruning on Table API & SQL
> --------------------------------------------
>
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many queries just
need to read a small subset of the total data. We can use partition information to prune or
skip over files irrelevant to the user’s queries. Both query optimization time and execution
time can be reduced obviously, especially for a large partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message