flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7548) Support watermark generation for TableSource
Date Mon, 25 Sep 2017 21:49:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179836#comment-16179836

Fabian Hueske commented on FLINK-7548:

Thanks for your thoughts [~xccui].
I added a few comments to your suggestions / questions.

Thanks, Fabian

bq. Considering that the data type should be preserved, it may bring extra logic if we do
that internally. To keep the consistency, I wonder if it's possible to encapsulate the time
into a new Rowtime<T> type. It exposes two methods, getTime(): Long for logical level
use and getValue(): T for physical level use.

In fact, {{Long}} and {{Timestamp}} have the same internal representation, namely {{Long}}.
The issue is more the type that is exposed to SQL or the Table API. We would need a new TimeIndicator
type that exposes a timestamp as {{Long}}.

bq. Besides, I think the watermark generation should not be bound with rowtime extraction.
Compared with implementing them in a single scan operator (not sure if I understood correctly),
I prefer to generate watermarks in extra operators. That should be more flexible.

Timestamp extraction and watermark generation would not be tight together. First, we would
compute timestamps (only necessary if we don't use an existing field). The next step would
extract watermarks. However, both operations would happen in the logical scan operator because
a single operator can be translated into multiple DataStream operations.

bq. I am thinking of a new record number bounded out-of-order generation strategy. Do you
think it will be useful in real applications?

How would this strategy work? IMO, built-in strategies should have a concrete use case in
mind which is common enough to justify a built-in primitive.

bq. I still feel that the machine time is not compatible with the rowtime watermark generation.
Shall we consider getting rid of it?

Machine time (assuming that you refer to processing time here) does not use watermarks. Watermarks
are only used for event-time processing. 

> Support watermark generation for TableSource
> --------------------------------------------
>                 Key: FLINK-7548
>                 URL: https://issues.apache.org/jira/browse/FLINK-7548
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>            Reporter: Jark Wu
> As discussed in FLINK-7446, currently the TableSource only support to define rowtime
field, but not support to extract watermarks from the rowtime field. We can provide a new
interface called {{DefinedWatermark}}, which has two methods {{getRowtimeAttribute}} (can
only be an existing field) and {{getWatermarkGenerator}}. The {{DefinedRowtimeAttribute}}
will be marked deprecated.
> How to support periodic and punctuated watermarks and support some built-in strategies
needs further discussion.

This message was sent by Atlassian JIRA

View raw message