flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Young (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-5185) Decouple BatchTableSourceScan with TableSourceTable
Date Thu, 01 Dec 2016 02:45:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710644#comment-15710644
] 

Kurt Young edited comment on FLINK-5185 at 12/1/16 2:45 AM:
------------------------------------------------------------

Hi [~fhueske], thanks for the reply.

Actually, this issue reflects a fundamental question we need to answer: "who decides the RowType
when you scan the table?" or "who is the schema authority?".

Apparently, your suggestion leads us with the answer: {{BatchTableSouceScan}}. But i think
the answer should be {{TableSource}}, which is responsible for all the real work. Let {{BatchTableSouceScan}}
holds the optional parameters for projection or filter will do the work for now, but when
we begin to introduce complex {{TableSources}}, which can even let you push part of the query
down, the {{BatchTableSouceScan}} will no longer competent for this job. 

So i propose to let {{TableSource}} be the RowType authority, and we can and only can get
RowType information from {{TableSource}}. And also, only {{TableSource}} can decide how to
react to the provided projection columns or filter condition or even the original query. {{BatchTableSouceScan}}
should pass these information to the {{TableSource}}, and just wait for the new RowType comes
from the new {{TableSource}}. After all, RowType is the only thing {{BatchTableSouceScan}}
cares and should care. 


was (Author: ykt836):
Hi [~fhueske], thanks for the reply.

Actually, this issue reflects a fundamental question we need to answer: "who decides the RowType
when you scan the table?" or "who is the schema authority?".

Apparently, your suggestion leads us with the answer: {{BatchTableSouceScan}}. But i think
the answer should be {{TableSource}}, which is responsible for all the real work. Let {{BatchTableSouceScan}}
holds the optional parameters for projection or filter will do the work for now, but when
we begin to introduce complex {{TableSources}}, which can even let you push part of the query
down, the {{BatchTableSouceScan}} will no longer competent for his job. 

So i propose to let {{TableSource}} be the RowType authority, and we can and only can get
RowType information from {{TableSource}}. And also, only {{TableSource}} can decide how to
react to the provided projection columns or filter condition or even the original query. {{BatchTableSouceScan}}
should pass these information to the {{TableSource}}, and just wait for the new RowType comes
from the new {{TableSource}}. After all, RowType is the only thing {{BatchTableSouceScan}}
cares and should care. 

> Decouple BatchTableSourceScan with TableSourceTable
> ---------------------------------------------------
>
>                 Key: FLINK-5185
>                 URL: https://issues.apache.org/jira/browse/FLINK-5185
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Kurt Young
>            Assignee: zhangjing
>            Priority: Minor
>
> As the components' relationship show in this design doc:
> https://docs.google.com/document/d/1PBnEbOcFHlEF1qGGAUgJvINdEXzzFTIRElgvs4-Tdeo/
> We found it's been annoying for {{BatchTableSourceScan}} directly holding {{TableSourceTable}},
and refer to {{TableSource}} further. It's ok if the relationship is immutable, but when we
want to change the {{TableSource}} when applying optimizations, it will cause some conflicts
and misunderstanding. 
> Since there is only one way to change {{TableSource}}, which is creating a new {{TableSourceTable}}
to hold the new {{TableSource}}, and create a new {{BatchTableSourceScan}} pointing to the
{{TableSourceTable}} which just created. The annoying part is the {{RelOptTable}} comes from
the super class {{TableScan}} still holds the connection to the original {{TableSourceTable}}
and {{TableSource}}. It will cause some misunderstanding, which one should the {{Scan}} rely
to, and what's difference between these tables. 
> Besides, {{TableSourceTable}} is not very useful in {{BatchTableSourceScan}}, the only
thing {{Scan}} cares is the {{RowType}} it returns, since this is and should be decided by
{{TableSource}}. So we can let {{BatchTableSourceScan}} directly holding {{TableSource}} instead
of holding {{TableSourceTable}}.If some original information are needed, find table through
{{RelOptTable}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message