flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3738) Refactor TableEnvironment and TranslationContext
Date Thu, 14 Apr 2016 14:47:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241296#comment-15241296

ASF GitHub Bot commented on FLINK-3738:

Github user fhueske commented on the pull request:

    I thought about that as well, but decided against an implicit conversion without TableEnvironment.
    What might happen otherwise is that users try to join or union tables that have different
implicitly created TableEnvironments or to register an implicitly converted table in a TableEnvironment.
This does not work due to Calcite internals. I think it is better to make the Table/TableEnvironment
relationship explicit and also easier to understand for users if an exception is thrown for
the above reason.

> Refactor TableEnvironment and TranslationContext
> ------------------------------------------------
>                 Key: FLINK-3738
>                 URL: https://issues.apache.org/jira/browse/FLINK-3738
>             Project: Flink
>          Issue Type: Task
>          Components: Table API
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
> Currently the TableAPI uses a static object called {{TranslationContext}} which holds
the Calcite table catalog and a Calcite planner instance. Whenever a {{DataSet}} or {{DataStream}}
is converted into a {{Table}} or registered as a {{Table}} on the {{TableEnvironment}}, a
new entry is added to the catalog. The first time a {{Table}} is added, a planner instance
is created. The planner is used to optimize the query (defined by one or more Table API operations
and/or one ore more SQL queries) when a {{Table}} is converted into a {{DataSet}} or {{DataStream}}.
Since a planner may only be used to optimize a single program, the choice of a single static
object is problematic.
> I propose to refactor the {{TableEnvironment}} to take over the responsibility of holding
the catalog and the planner instance. 
> - A {{TableEnvironment}} holds a catalog of registered tables and a single planner instance.
> - A {{TableEnvironment}} will only allow to translate a single {{Table}} (possibly composed
of several Table API operations and SQL queries) into a {{DataSet}} or {{DataStream}}. 
> - A {{TableEnvironment}} is bound to an {{ExecutionEnvironment}} or a {{StreamExecutionEnvironment}}.
This is necessary to create data source or source functions to read external tables or streams.
> - {{DataSet}} and {{DataStream}} need a reference to a {{TableEnvironment}} to be converted
into a {{Table}}. This will prohibit implicit casts as currently supported for the DataSet
Scala API.
> - A {{Table}} needs a reference to the {{TableEnvironment}} it is bound to. Only tables
from the same {{TableEnvironment}} can be processed together.
> - The {{TranslationContext}} will be completely removed.

This message was sent by Atlassian JIRA

View raw message