kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Koshy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability
Date Mon, 03 Nov 2014 20:20:35 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195029#comment-14195029
] 

Joel Koshy commented on KAFKA-1555:
-----------------------------------

Sounds good. In that case, can you modify it a bit? The only remaining confusion is that earlier
on in that section we start by writing about acknowledgement by all replicas, but then directly
(without further comment) assume it is actually acknowledgement by the current in-sync replicas.
How about the following:
Instead of _A message that has been acknowledged by all in-sync replicas..._ we can write
_A message that has been acknowledged by all replicas..._. And then say _Note that "acknowledgement
by all replicas" does not guarantee that the full set of assigned replicas have received the
message. By default, acknowledgement happens as soon as all the current in-sync replicas have
received the message. For example, if a topic is configured with only two replicas and one
fails (i.e., only one in sync replica remains), then writes that specify required.acks=-1
will succeed. However, these writes could be lost if the remaining replica also fails. Although
this ensures maximum availability ..._ (from earlier comment)

As for the design itself: I just thought that the broker-side setting taking effect only with
a client-setting is a bit odd especially if it does not hurt do so with the other ack settings.

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Gwen Shapira
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1555-DOCS.0.patch, KAFKA-1555-DOCS.1.patch, KAFKA-1555-DOCS.2.patch,
KAFKA-1555-DOCS.3.patch, KAFKA-1555.0.patch, KAFKA-1555.1.patch, KAFKA-1555.2.patch, KAFKA-1555.3.patch,
KAFKA-1555.4.patch, KAFKA-1555.5.patch, KAFKA-1555.5.patch, KAFKA-1555.6.patch, KAFKA-1555.8.patch,
KAFKA-1555.9.patch
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers can satisfy
two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but no message
loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements due to
its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less messages may
be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it has less
messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be stored in only
1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume that at
the beginning, all 3 replicas, leader A, followers B and C, are in sync, i.e., they have the
same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are the following
cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this time, although
C hasn't received m, C is still in ISR. If A is killed, C can be elected as the new leader,
and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a message m to A,
and receives an acknowledgement. Disk failure happens in A before B and C replicate m. Message
m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message