kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewen Cheslack-Postava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4666) Failure test for Kafka configured for consistency vs availability
Date Tue, 24 Jan 2017 22:52:26 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836790#comment-15836790

Ewen Cheslack-Postava commented on KAFKA-4666:

[~ecesena] ducktape test to validate this is a nice way to validate this :) By "losing" data,
do you mean that the acked data never becomes visible to consumers if the first broker never
comes back? If so, this is expected. Even if you specify a smaller # of acks, data will not
be visible to consumers until it's been acked by the ISR (and there are enough to satisfy

I don't think there's anything unexpected in your test, but I agree it could be made clearer
in that section that acks=all is important if you want the producer to only get acked when
the data has been replicated sufficiently to protect against loss.

 (Of course, if you have unclean leader election enabled, then there are other scenarios you
can lose data.)

> Failure test for Kafka configured for consistency vs availability
> -----------------------------------------------------------------
>                 Key: KAFKA-4666
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4666
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Emanuele Cesena
>         Attachments: consistency_test.py
> We recently had an issue with our Kafka setup because of a misconfiguration.
> In short, we thought we have configured Kafka for durability, but we didn't set the producers
to acks=all. During a full outage, we had situations where some partitions were "partitioned",
meaning that the followers started without properly waiting for the right leader, and thus
we lost data. Again, this is not an issue with Kafka, but a misconfiguration on our side.
> I think we reproduced the issue, and we built a docker test that proves that, if the
producer isn't set with acks=all, then data can be lost during an almost full outage. The
test is attached.
> I was thinking to send a PR, but wanted to run this through you first, as it's not necessarily
proving that a feature works as expected.
> In addition, I think the documentation could be slightly improved, for instance in the
> http://kafka.apache.org/documentation/#design_ha
> by clearly stating that there are 3 steps one should do for configuring kafka for consistency,
the third being that producers should be set with acks=all (which is now part of the 2nd point).
> Please let me know what do you think, and I can send a PR if you agree.

This message was sent by Atlassian JIRA

View raw message