flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4616) Kafka consumer doesn't store last emmited watermarks per partition in state
Date Thu, 26 Jan 2017 06:21:24 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839317#comment-15839317
] 

ASF GitHub Bot commented on FLINK-4616:
---------------------------------------

Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3031#discussion_r97935696
  
    --- Diff: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
    @@ -175,34 +176,115 @@ protected AbstractFetcher(
     	// ------------------------------------------------------------------------
     
     	/**
    -	 * Takes a snapshot of the partition offsets.
    +	 * Takes a snapshot of the partition offsets and watermarks.
     	 * 
     	 * <p>Important: This method mus be called under the checkpoint lock.
     	 * 
    -	 * @return A map from partition to current offset.
    +	 * @return A map from partition to current offset and watermark.
     	 */
    -	public HashMap<KafkaTopicPartition, Long> snapshotCurrentState() {
    +	public HashMap<KafkaTopicPartition, Tuple2<Long, Long>> snapshotCurrentState()
{
     		// this method assumes that the checkpoint lock is held
     		assert Thread.holdsLock(checkpointLock);
     
    -		HashMap<KafkaTopicPartition, Long> state = new HashMap<>(allPartitions.length);
    -		for (KafkaTopicPartitionState<?> partition : subscribedPartitions()) {
    -			state.put(partition.getKafkaTopicPartition(), partition.getOffset());
    +		HashMap<KafkaTopicPartition, Tuple2<Long, Long>> state = new HashMap<>(allPartitions.length);
    +
    +		switch (timestampWatermarkMode) {
    +
    +			case NO_TIMESTAMPS_WATERMARKS: {
    +
    +				for (KafkaTopicPartitionState<KPH> partition : allPartitions) {
    +					state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(), Long.MIN_VALUE));
    +				}
    +
    +				return state;
    +			}
    +
    +			case PERIODIC_WATERMARKS: {
    +				KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> [] partitions =
    +					(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> []) allPartitions;
    +
    +				for (KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> partition : partitions)
{
    +					state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(), partition.getCurrentWatermarkTimestamp()));
    +				}
    +
    +				return state;
    +			}
    +
    +			case PUNCTUATED_WATERMARKS: {
    +				KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> [] partitions =
    +					(KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> []) allPartitions;
    +
    +				for (KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> partition : partitions)
{
    +					state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(), partition.getCurrentPartitionWatermark()));
    +				}
    +
    +				return state;
    +			}
    +
    +			default:
    +				// cannot happen, add this as a guard for the future
    +				throw new RuntimeException();
    --- End diff --
    
    Would be good to have a reason message here.


> Kafka consumer doesn't store last emmited watermarks per partition in state
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-4616
>                 URL: https://issues.apache.org/jira/browse/FLINK-4616
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.1.1
>            Reporter: Yuri Makhno
>            Assignee: Roman Maier
>
> Kafka consumers stores in state only kafka offsets and doesn't store last emmited watermarks,
this may go to wrong state when checkpoint is restored:
> Let's say our watermark is (timestamp - 10) and in case we have the following messages
queue results will be different after checkpoint restore and during normal processing:
> A(ts = 30)
> B(ts = 35)
> ------ checkpoint goes here
> C(ts=15) -- this one should be filtered by next time window
> D(ts=60)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message