kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayuresh Gharat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3040) Broker didn't report new data after change in leader
Date Mon, 04 Jan 2016 18:22:39 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081519#comment-15081519
] 

Mayuresh Gharat commented on KAFKA-3040:
----------------------------------------

At Linkedin, we do have a separate controller log file on the broker that is the controller
for the cluster. Can you see something like this "Broker HOST-NAME starting become controller
state transition" on the broker that is the controller for the cluster?

> Broker didn't report new data after change in leader
> ----------------------------------------------------
>
>                 Key: KAFKA-3040
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3040
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>         Environment: Debian 3.2.54-2 x86_64 GNU/Linux
>            Reporter: Imran Patel
>            Priority: Critical
>
> Recently we had an event that causes large Kafka backlogs to develop suddenty. This happened
across multiple partitions. We noticed that after a brief connection loss to Zookeeper, Kafka
brokers were not reporting no new data to our (SimpleConsumer) consumer although the producers
were enqueueing fine. This went on until another zk blip led to a reconfiguration which suddenly
caused the consumers to "see" the data. Our consumers and our monitoring tools did not see
the offsets move during the outage window. Here is the sequence of events for a single partition
(with logs attached below). 
> The brokers are running 0.9, the producer is using library version kafka_2.10:0.8.2.1
and consumer is using kafka_2.10:0.8.0 (both are Java programs). Our monitoring tool uses
kafka-python-9.0
> Can you tell us if this could be due to a consumer bug (the libraries being too "old"
to operate with 0.9 broker, for e.g.)? Or does it look a Kafka core issue? Please note that
we recently upgraded the brokers to 0.9 and hadn't seen a similar issue prior to that.
> - after a brief connection loss to zookeeper, the partition leader (broker 9 for partition
29 in logs below) came back and shrunk the ISR to itself. 
> - producers kept on successfully sending data to Kafka and the remaining replicas (brokers
3 and 4) recorded this data. AFAICT, 3 was the new leader. Broker 9 did NOT replicate this
data. It did repeatedly print the ISR shrinking message over and over again.
> - consumer on the other hand reported no new data presumably because it was talking to
9 and that broker was doing nothing.
> - 6 hours later, another zookeeper blip causes the brokers to reconfigure and now consumers
started seeing new data. 
> Broker 9:
> [2015-12-16 19:46:01,523] INFO Partition [messages,29] on broker 9: Expanding ISR for
partition [messages,29] from 9,4 to 9,4,3 (kafka.cluster.Partition
> [2015-12-18 00:59:25,511] INFO New leader is 9 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2015-12-18 01:00:18,451] INFO Partition [messages,29] on broker 9: Shrinking ISR for
partition [messages,29] from 9,4,3 to 9 (kafka.cluster.Partition)
> [2015-12-18 01:00:18,458] INFO Partition [messages,29] on broker 9: Cached zkVersion
[472] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> [2015-12-18 07:04:44,552] INFO Truncating log messages-29 to offset 14169556269. (kafka.log.Log)
> [2015-12-18 07:04:44,649] INFO [ReplicaFetcherManager on broker 9] Added fetcher for
partitions List([[messages,61], initOffset 14178575900 to broker BrokerEndPoint(6,kafka006-prod.c.foo.internal,9092)]
, [[messages,13], initOffset 14156091271 to broker BrokerEndPoint(2,kafka002-prod.c.foo.internal,9092)]
, [[messages,45], initOffset 14135826155 to broker BrokerEndPoint(4,kafka004-prod.c.foo.internal,9092)]
, [[messages,41], initOffset 14157926400 to broker BrokerEndPoint(1,kafka001-prod.c.foo.internal,9092)]
, [[messages,29], initOffset 14169556269 to broker BrokerEndPoint(3,kafka003-prod.c.foo.internal,9092)]
, [[messages,57], initOffset 14175218230 to broker BrokerEndPoint(1,kafka001-prod.c.foo.internal,9092)]
) (kafka.server.ReplicaFetcherManager)
> Broker 3:
> [2015-12-18 01:00:01,763] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for
partitions [messages,29] (kafka.server.ReplicaFetcherManager)
> [2015-12-18 07:09:04,631] INFO Partition [messages,29] on broker 3: Expanding ISR for
partition [messages,29] from 4,3 to 4,3,9 (kafka.cluster.Partition)
> [2015-12-18 07:09:49,693] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for
partitions [messages,29] (kafka.server.ReplicaFetcherManager)
> Broker 4:
> [2015-12-18 01:00:01,783] INFO [ReplicaFetcherManager on broker 4] Removed fetcher for
partitions [messages,29] (kafka.server.ReplicaFetcherManager)
> [2015-12-18 01:00:01,866] INFO [ReplicaFetcherManager on broker 4] Added fetcher for
partitions List([[messages,29], initOffset 14169556262 to broker BrokerEndPoint(3,kafka003-prod.c.foo.internal,9092)]
) (kafka.server.ReplicaFetcherManager)
> [2015-12-18 07:09:50,191] ERROR [ReplicaFetcherThread-0-3], Error for partition [messages,29]
to broker 3:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
> Consumer:
> 2015-12-18 01:00:01.791 [P29-Reader] INFO  com.example.utils.kafkalib.KafkaConsumer -
7 messages read from partition 29 starting from offset 14169556262
> 2015-12-18 01:00:01.791 [P29-Reader] INFO  com.example.utils.kafkalib.KafkaConsumer -
0 messages read from partition 29 starting from offset 14169556269
> 2015-12-18 07:04:44.293 [P29-Reader] INFO  com.example.utils.kafkalib.KafkaConsumer -
0 messages read from partition 29 starting from offset 14169556269
> 2015-12-18 07:04:44.303 [P29-Reader] WARN  com.example.project.consumer.kafka.PartitionReader
- Error fetching data from the Broker:kafka009-prod.c.foo.internal Reason: NotLeaderForPartitionCode
> 2015-12-18 07:04:44.304 [P29-Reader] INFO  com.example.project.consumer.kafka.PartitionData
- Attempting to connectAndRead leader for topic: messages partition: 29
> 2015-12-18 07:04:44.309 [P29-Reader] INFO  com.example.project.consumer.kafka.PartitionData
- Leader for topic: messages partition: 29 set to kafka003-prod.c.foo.internal
> 2015-12-18 07:04:44.749 [P29-Reader] INFO  com.example.utils.kafkalib.KafkaConsumer -
6514 messages read from partition 29 starting from offset 14169556269



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message