flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gyula Fora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6006) Kafka Consumer can lose state if queried partition list is incomplete on restore
Date Thu, 23 Mar 2017 13:32:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938297#comment-15938297

Gyula Fora commented on FLINK-6006:

I tried restarting the consumer when I was sure that the brokers have completely recovered
and that still didnt help.

> Kafka Consumer can lose state if queried partition list is incomplete on restore
> --------------------------------------------------------------------------------
>                 Key: FLINK-6006
>                 URL: https://issues.apache.org/jira/browse/FLINK-6006
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector, Streaming Connectors
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Blocker
>             Fix For: 1.1.5, 1.2.1
> In 1.1.x and 1.2.x, the FlinkKafkaConsumer performs partition list querying on restore.
Then, only restored state of partitions that exists in the queried list is used to initialize
the fetcher's state holders.
> If in any case the returned partition list is incomplete (i.e. missing partitions that
existed before, perhaps due to temporary ZK / broker downtime), then the state of the missing
partitions is dropped and cannot be recovered anymore.
> In 1.3-SNAPSHOT, this is fixed by changes in FLINK-4280, so only 1.1 and 1.2 is affected.
> We can backport some of the behavioural changes there to 1.1 and 1.2. Generally, we should
not depend on the current partition list in Kafka when restoring, but just restore all previous
state into the fetcher's state holders. 
> This would therefore also require some checking on how the consumer threads / Kafka clients
behave when its assigned partitions cannot be reached.

This message was sent by Atlassian JIRA

View raw message