flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3226) Translate optimized logical Table API plans into physical plans representing DataSet programs
Date Fri, 12 Feb 2016 09:33:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144312#comment-15144312
] 

ASF GitHub Bot commented on FLINK-3226:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1624#issuecomment-183251312
  
    Hi, I thought about using POJOs as native types within Table/SQL operators. IMO, the gains
are very little compared to the added code complexity. Given a POJO input, we can preserve
the input type only for very few operations such as a Filter. For most other operations, we
need to generate a new output type (Tuple or Row). I am a bit skeptical about adding a lot
of codeGen code with special cases for POJOs (such as the field index mapping) which is very
seldom used. Moreover, POJO field accesses (for operations and serialization) go through reflection
and are not very efficient. So even the performance gain for those few cases where POJOs can
be used is not clear.
    
    I do not question the native type support in general. Tuples and primitives should definitely
be supported, but I don't think we need to support POJOs within Table / SQL operators. Instead,
I would convert POJO datasets into Row tables during table scan. Most of the code in this
PR can be used to implement a codeGen'd converter Map function.
    
    What do you think @twalthr?



> Translate optimized logical Table API plans into physical plans representing DataSet
programs
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-3226
>                 URL: https://issues.apache.org/jira/browse/FLINK-3226
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API
>            Reporter: Fabian Hueske
>            Assignee: Chengxiang Li
>
> This issue is about translating an (optimized) logical Table API (see FLINK-3225) query
plan into a physical plan. The physical plan is a 1-to-1 representation of the DataSet program
that will be executed. This means:
> - Each Flink RelNode refers to exactly one Flink DataSet or DataStream operator.
> - All (join and grouping) keys of Flink operators are correctly specified.
> - The expressions which are to be executed in user-code are identified.
> - All fields are referenced with their physical execution-time index.
> - Flink type information is available.
> - Optional: Add physical execution hints for joins
> The translation should be the final part of Calcite's optimization process.
> For this task we need to:
> - implement a set of Flink DataSet RelNodes. Each RelNode corresponds to one Flink DataSet
operator (Map, Reduce, Join, ...). The RelNodes must hold all relevant operator information
(keys, user-code expression, strategy hints, parallelism).
> - implement rules to translate optimized Calcite RelNodes into Flink RelNodes. We start
with a straight-forward mapping and later add rules that merge several relational operators
into a single Flink operator, e.g., merge a join followed by a filter. Timo implemented some
rules for the first SQL implementation which can be used as a starting point.
> - Integrate the translation rules into the Calcite optimization process



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message