flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5859) support partition pruning on Table API & SQL
Date Thu, 23 Feb 2017 10:31:44 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880231#comment-15880231

Fabian Hueske commented on FLINK-5859:

Hi [~ykt836],

My main motivation to treat partition pruning as filter push-down is to keep the complexity
of the optimizer as small as possible.

You are right, the effort to determine whether a filter can be applied or not depends on the
format of the source. However, I don't think that this necessarily means that partition pruning
must be handled as a special case. In the end it depends on the TableSource how it determines
which predicates apply and which don't. A partitionable table source would not need to scan
all metadata. 

I see your point about the effort and complexity to implement a partitionable TableSource.

What do you think of the following approach?
We implement a {{PartitionableTableSource}} as an abstract class that implements the {{FilterableTableSource}}
interface. {{PartitionableTableSource}} would have abstract methods to list the partitioned
fields (and maybe some more). Based on that information {{PartitionableTableSource}} implements
{{FilterableTableSource.setPredicate()}} and {{FilterableTableSource.getPredicate()}}, i.e.,
the {{PartitionableTableSource}} automatically extracts the right filter expressions and returns
everything it cannot deal with based on the provided partitioned fields.
TableSources which just support filter push-down by partition pruning implement {{PartitionableTableSource}}
and only have to specify the partition columns and not have to deal with {{setPredicate()}}.

This solution would keep all partition pruning related logic out of the optimizer and table

What you think?

> support partition pruning on Table API & SQL
> --------------------------------------------
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many queries just
need to read a small subset of the total data. We can use partition information to prune or
skip over files irrelevant to the user’s queries. Both query optimization time and execution
time can be reduced obviously, especially for a large partitioned table.

This message was sent by Atlassian JIRA

View raw message