flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shikhar Bhushan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3375) Allow Watermark Generation in the Kafka Source
Date Tue, 09 Feb 2016 14:42:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139005#comment-15139005
] 

Shikhar Bhushan commented on FLINK-3375:
----------------------------------------

This would be super useful for me, as I currently have to unnecessarily use a parallelism
of 30 since there are 30 partitions, when even parallelism=1 would suffice and works more
efficiently.

It would be great if the existing {{TimestampExtractor}} interface can be supported, or any
other interface to allow for watermark to be determined in a different way than simply ascending
-- in my case, the timestamps on a partition should be mostly ascending but the messages are
produced from different machines so I need to account for small inconsistencies in their system
clocks. Currently using this extractor: https://gist.github.com/shikhar/2d9306e2ebd8ca89728c

> Allow Watermark Generation in the Kafka Source
> ----------------------------------------------
>
>                 Key: FLINK-3375
>                 URL: https://issues.apache.org/jira/browse/FLINK-3375
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>    Affects Versions: 1.0.0
>            Reporter: Stephan Ewen
>             Fix For: 1.0.0
>
>
> It is a common case that event timestamps are ascending inside one Kafka Partition. Ascending
timestamps are easy for users, because they are handles by ascending timestamp extraction.
> If the Kafka source has multiple partitions per source task, then the records become
out of order before timestamps can be extracted and watermarks can be generated.
> If we make the FlinkKafkaConsumer an event time source function, it can generate watermarks
itself. It would internally implement the same logic as the regular operators that merge streams,
keeping track of event time progress per partition and generating watermarks based on the
current guaranteed event time progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message