kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Babrou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3039) Temporary loss of leader resulted in log being completely truncated
Date Wed, 30 Aug 2017 11:07:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147070#comment-16147070
] 

Ivan Babrou commented on KAFKA-3039:
------------------------------------

We also experienced this and out of 28 upgraded nodes in one rack 4 nodes decided to nuke
1 partition (different partitions on each node):

{noformat}
2017-08-30T10:17:29.509 node-93 WARN [ReplicaFetcherThread-0-10042]: Based on follower's leader
epoch, leader replied with an unknown offset in requests-48. High watermark 0 will be used
for truncation. (kafka.server.ReplicaFetcherThread)
2017-08-30T10:17:29.510 node-93 INFO Truncating log requests-48 to offset 0. (kafka.log.Log)
--
2017-08-30T10:17:29.536 node-93 WARN [ReplicaFetcherThread-0-10082]: Based on follower's leader
epoch, leader replied with an unknown offset in requests-80. High watermark 0 will be used
for truncation. (kafka.server.ReplicaFetcherThread)
2017-08-30T10:17:29.536 node-93 INFO Truncating log requests-80 to offset 0. (kafka.log.Log)
--
2017-08-30T10:26:32.203 node-87 WARN [ReplicaFetcherThread-2-10056]: Based on follower's leader
epoch, leader replied with an unknown offset in requests-82. High watermark 0 will be used
for truncation. (kafka.server.ReplicaFetcherThread)
2017-08-30T10:26:32.204 node-87 INFO Truncating log requests-82 to offset 0. (kafka.log.Log)
--
2017-08-30T10:27:31.755 node-89 WARN [ReplicaFetcherThread-3-10055]: Based on follower's leader
epoch, leader replied with an unknown offset in requests-79. High watermark 0 will be used
for truncation. (kafka.server.ReplicaFetcherThread)
2017-08-30T10:27:31.756 node-89 INFO Truncating log requests-79 to offset 0. (kafka.log.Log)
{noformat}

This was a rolling upgrade from 0.10.2.0 to 0.11.0.0. Nodes that truncated logs were not leaders
before the upgrade (not even preferred).

> Temporary loss of leader resulted in log being completely truncated
> -------------------------------------------------------------------
>
>                 Key: KAFKA-3039
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3039
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>         Environment: Debian 3.2.54-2 x86_64 GNU/Linux
>            Reporter: Imran Patel
>            Priority: Critical
>              Labels: reliability
>
> We had an event recently where the temporarily loss of a leader for a partition (during
a manual restart), resulted in the leader coming back with no high watermark state and truncating
its log to zero. Logs (attached below) indicate that it did have the data but not the commit
state. How is this possible?
> Leader (broker 3)
> [2015-12-18 21:19:44,666] INFO Completed load of log messages-14 with log end offset
14175963374 (kafka.log.Log)
> [2015-12-18 21:19:45,170] INFO Partition [messages,14] on broker 3: No checkpointed highwatermark
is found for partition [messages,14] (kafka.cluster.Partition)
> [2015-12-18 21:19:45,238] INFO Truncating log messages-14 to offset 0. (kafka.log.Log)
> [2015-12-18 21:20:34,066] INFO Partition [messages,14] on broker 3: Expanding ISR for
partition [messages,14] from 3 to 3,10 (kafka.cluster.Partition)
> Replica (broker 10)
> [2015-12-18 21:19:19,525] INFO Partition [messages,14] on broker 10: Shrinking ISR for
partition [messages,14] from 3,10,4 to 10,4 (kafka.cluster.Partition)
> [2015-12-18 21:20:34,049] ERROR [ReplicaFetcherThread-0-3], Current offset 14175984203
for partition [messages,14] out of range; reset offset to 35977 (kafka.server.ReplicaFetcherThread)
> [2015-12-18 21:20:34,033] WARN [ReplicaFetcherThread-0-3], Replica 10 for partition [messages,14]
reset its fetch offset from 14175984203 to current leader 3's latest offset 35977 (kafka.server.ReplicaFetcherThread)
> Some relevant config parameters:
>         offsets.topic.replication.factor = 3
>         offsets.commit.required.acks = -1
>         replica.high.watermark.checkpoint.interval.ms = 5000
>         unclean.leader.election.enable = false



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message