kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Koshy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
Date Wed, 01 Oct 2014 23:05:36 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Joel Koshy updated KAFKA-1647:
    Assignee: Jiangjie Qin

> Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
> ----------------------------------------------------------------------------------------
>                 Key: KAFKA-1647
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1647
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>            Reporter: Joel Koshy
>            Assignee: Jiangjie Qin
>            Priority: Critical
>              Labels: newbie++
> We ran into this scenario recently in a production environment. This can happen when
enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not
cause this issue. It can occur if all replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo)
from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change for all
replicas on b1 to become online. As part of this replica state change it gets the last known
leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request
contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0
but it is not included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself
(b1) the leader for p1 (and create the local replica object corresponding to p1). It will
however abort the become follower transition for p0 because the designated leader b0 is offline.
So it will not create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has a local
replica object, only p1's high water mark will be checkpointed to disk. p0's previously written
checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even if the online
transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we saw 2 and
> # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss
of nearly all data if the broker becomes follower, truncates, and soon after happens to become
> # High IO on brokers that lose their high water mark then subsequently (on a successful
become follower transition) truncate their log to zero and start catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is because during
an offset load we don't read past the high water mark. So if a water mark is missing then
we don't load anything (even if the offsets are there in the log).

This message was sent by Atlassian JIRA

View raw message