flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-8086) FlinkKafkaProducer011 can permanently fail in recovery
Date Wed, 15 Nov 2017 14:51:00 GMT
Stefan Richter created FLINK-8086:

             Summary: FlinkKafkaProducer011 can permanently fail in recovery
                 Key: FLINK-8086
                 URL: https://issues.apache.org/jira/browse/FLINK-8086
             Project: Flink
          Issue Type: Bug
          Components: Kafka Connector
    Affects Versions: 1.4.0, 1.5.0
            Reporter: Stefan Richter
            Priority: Blocker

Chaos monkey test in a cluster environment can permanently bring down our FlinkKafkaProducer011.

Typically, after a small number of randomly killed TMs, the data generator job is no longer
able to recover from a checkpoint because of the following problem:

org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with
an old epoch. Either there is a newer producer with the same transactionalId, or the producer's
transaction has been expired by the broker.

The problem is reproduceable and happened for me in every run after the choas monkey killed
a couple of TMs.

This message was sent by Atlassian JIRA

View raw message