kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabien LD (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6915) MirrorMaker: avoid duplicates when source cluster is unreachable for more than session.timeout.ms
Date Fri, 18 May 2018 05:27:00 GMT
Fabien LD created KAFKA-6915:
--------------------------------

             Summary: MirrorMaker: avoid duplicates when source cluster is unreachable for
more than session.timeout.ms
                 Key: KAFKA-6915
                 URL: https://issues.apache.org/jira/browse/KAFKA-6915
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 1.1.0
            Reporter: Fabien LD


According to doc, see [https://kafka.apache.org/11/documentation.html#semantics], the exactly-once
delivery can be achieved by storing offsets in the same store as produced data:
{quote}
When writing to an external system, the limitation is in the need to coordinate the consumer's
position with what is actually stored as output. The classic way of achieving this would be
to introduce a two-phase commit between the storage of the consumer position and the storage
of the consumers output. But this can be handled more simply and generally by letting the
consumer store its offset in the same place as its output
{quote}

Indeed, with current implementation where the consumer stores the offsets in the source cluster,
we can have duplicates if networks makes source cluster unreachable for more than {{session.timeout.ms}}.
Indeed, once that amount of time has passed, the source cluster will rebalance the consumer
group and later, when network is back, the generation has changed and consumers cannot commit
the offsets for the last batches of records consumed (actually all records processed during
the last {{auto.commit.interval.ms}}). So all those records are processed again when consumers
from group are coming back.

Storing the offsets in the target cluster would resolve this risk of duplicate records and
would be a nice feature to have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message