flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Cox (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3375) Allow Watermark Generation in the Kafka Source
Date Wed, 17 Feb 2016 13:46:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150515#comment-15150515

Zach Cox commented on FLINK-3375:

If there is any way to make this available before 1.1 that would be awesome. I know I will
need it - our Flink job needs to process four input Kafka topics, each with 12 partitions,
but parallelism=12 will be way more than we need in the Flink job. I would be more than happy
to help out with it in any way.

> Allow Watermark Generation in the Kafka Source
> ----------------------------------------------
>                 Key: FLINK-3375
>                 URL: https://issues.apache.org/jira/browse/FLINK-3375
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>    Affects Versions: 1.0.0
>            Reporter: Stephan Ewen
>             Fix For: 1.0.0
> It is a common case that event timestamps are ascending inside one Kafka Partition. Ascending
timestamps are easy for users, because they are handles by ascending timestamp extraction.
> If the Kafka source has multiple partitions per source task, then the records become
out of order before timestamps can be extracted and watermarks can be generated.
> If we make the FlinkKafkaConsumer an event time source function, it can generate watermarks
itself. It would internally implement the same logic as the regular operators that merge streams,
keeping track of event time progress per partition and generating watermarks based on the
current guaranteed event time progress.

This message was sent by Atlassian JIRA

View raw message