flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjar Akhmedov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3188) Deletes in Kafka source should be passed on to KeyedDeserializationSchema
Date Tue, 22 Dec 2015 08:49:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067760#comment-15067760
] 

Sanjar Akhmedov commented on FLINK-3188:
----------------------------------------

Deletes(msg with key and null value) are consumed as normal messages. The only special thing
about them is, kafka can be configured to clean up deletes to free up some space. Exposing
deletes is part of the kafka contract:
bq. All delete markers for deleted records will be seen provided the reader reaches the head
of the log in a time period less than the topic's delete.retention.ms setting (the default
is 24 hours).
All official consumer apis, including new one, expose null values to the client.

> Deletes in Kafka source should be passed on to KeyedDeserializationSchema
> -------------------------------------------------------------------------
>
>                 Key: FLINK-3188
>                 URL: https://issues.apache.org/jira/browse/FLINK-3188
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.0.0
>            Reporter: Sebastian Klemke
>            Assignee: Robert Metzger
>         Attachments: kafka-deletions.patch
>
>
> When keys are deleted in the kafka queue, they show up as keys with null payload. Currently
in Flink 1.0-SNAPSHOT, these deletions are silently skipped, without increasing current offset.
> This leads to two problems:
> 1. When a fetch window contains only deletions, LegacyFetcher gets stuck
> 2. For KeyedDeserializationSchemas, it would make sense to pass deletions to the deserializer,
so that it can decide to wrap deleted keys as a deletion command. This is also more consistent
with the semantics of keys in Kafka queues: When compaction is activated, only the latest
message with the same key needs to be kept by Kafka.
> We propose the attached patch as a workaround for both issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message