flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7337) Refactor handling of time indicator attributes
Date Sun, 13 Aug 2017 21:47:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125053#comment-16125053

ASF GitHub Bot commented on FLINK-7337:

Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/StreamTableEnvironment.scala
    @@ -667,30 +719,62 @@ abstract class StreamTableEnvironment(
         // get CRow plan
         val plan: DataStream[CRow] = translateToCRow(logicalPlan, queryConfig)
    +    val rowtimeFields = logicalType
    +      .getFieldList.asScala
    +      .filter(f => FlinkTypeFactory.isRowtimeIndicatorType(f.getType))
    +    // convert the input type for the conversion mapper
    +    // the input will be changed in the OutputRowtimeProcessFunction later
    +    val convType = if (rowtimeFields.size > 1) {
    +      throw new TableException(
    --- End diff --
    I think we should avoid implicit defaults like using the timestamp attribute of the left
most table (the left most table might not have a time indicator attribute, join order optimization
would change the order of tables) and special cases for queries like `SELECT *`. 
    When a `Table` is converted into a `DataStream` it is likely that the resulting stream
is further processed by logic that cannot be expressed in SQL / Table API. If a `Table` has
multiple timestamp attributes, IMO a user should be forced to make a choice for the `StreamRecord`
timestamp, because the semantics of any subsequent time-based operations will depend on that.
I see two ways to do that:
    - ensure that only one attribute is a time indicator by casting the others to `TIMESTAMP`
    - let the user specify which field should be used as timestamp as an additional parameter
of the `toAppendStream` and `toRetractStream` methods.
    We could also do both.
    I agree with @wuchong that we do not need this restriction when we emit a `Table` to a

> Refactor handling of time indicator attributes
> ----------------------------------------------
>                 Key: FLINK-7337
>                 URL: https://issues.apache.org/jira/browse/FLINK-7337
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.4.0
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
> After a [discussion on the dev mailing list|https://lists.apache.org/thread.html/735d55f9022df8ff73566a9f1553e14be94f8443986ad46559b35869@%3Cdev.flink.apache.org%3E]
I propose the following changes to the current handling of time indicator attributes:
> * Remove the separation of logical and physical row type.
> ** Hold the event-time timestamp as regular Long field in Row
> ** Represent the processing-time indicator type as a null-valued field in Row (1 bit
> * Remove materialization of event-time timestamps because timestamp is already accessible
in Row.
> * Add {{ProcessFunction}} to set timestamp into the timestamp field of a {{StreamRecord}}.

This message was sent by Atlassian JIRA

View raw message