kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AS (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6178) Broker is listed as only ISR for all partitions it is leader of
Date Mon, 06 Nov 2017 20:57:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

AS updated KAFKA-6178:
----------------------
    Description: 
We're running a 15 broker cluster on windows machines, and one of the brokers, 10, is the
only ISR on all partitions that it is the leader of. On partitions where it isn't the leader,
it seems to follow the leadeer fine. This is an excerpt from 'describe':


bq. Topic: ClientQosCombined      Partition: 458  Leader: 10      Replicas: 10,6,7,8,9,0,1
  Isr: 10
bq. Topic: ClientQosCombined      Partition: 459  Leader: 11      Replicas: 11,7,8,9,0,1,10
Isr: 0,10,1,9,7,11,8


The server.log files all seem to be pretty standard, and the only indication of this issue
is the following pattern that often repeats:


2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition [kafka-request-handler-8:] - Partition
[ClientQosCombined,398] on broker 10: Expanding ISR for partition [ClientQosCombined,398]
from 10 to 5,10
2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - Partition [ClientQosCombined,398]
on broker 10: Shrinking ISR for partition [ClientQosCombined,398] from 5,10 to 10


For each of the partitions that 10 leads. This is the only topic that we currently have in
our cluster. The __consumer_offsets topic seems completely normal in terms of isr counts.
The controller is broker 5, which is cycling through attempting and failing to trigger leader
elections on broker 10 led partitions. From the controller log in broker 5:


2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController [kafka-scheduler-0:] - [Controller
5]: Starting preferred replica leader election for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine [kafka-scheduler-0:]
- [Partition state machine on Controller 5]: Invoking state change to OnlinePartition for
partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:]
- [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition [ClientQosCombined,375]
is not the preferred replica. Trigerring preferred replica leader election
2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController [kafka-scheduler-0:] - [Controller
5]: Partition [ClientQosCombined,375] failed to complete preferred replica leader election.
Leader is 10


I've also attached the logs and output from broker 10. Any idea what's wrong here? 

  was:
We're running a 15 broker cluster on windows machines, and one of the brokers, 10, is the
only ISR on all partitions that it is the leader of. On partitions where it isn't the leader,
it seems to follow the leadeer fine. This is an excerpt from 'describe':


{quote}Topic: ClientQosCombined      Partition: 458  Leader: 10      Replicas: 10,6,7,8,9,0,1
  Isr: 10
Topic: ClientQosCombined      Partition: 459  Leader: 11      Replicas: 11,7,8,9,0,1,10 Isr:
0,10,1,9,7,11,8{quote}


The server.log files all seem to be pretty standard, and the only indication of this issue
is the following pattern that often repeats:


2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition [kafka-request-handler-8:] - Partition
[ClientQosCombined,398] on broker 10: Expanding ISR for partition [ClientQosCombined,398]
from 10 to 5,10
2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - Partition [ClientQosCombined,398]
on broker 10: Shrinking ISR for partition [ClientQosCombined,398] from 5,10 to 10


For each of the partitions that 10 leads. This is the only topic that we currently have in
our cluster. The __consumer_offsets topic seems completely normal in terms of isr counts.
The controller is broker 5, which is cycling through attempting and failing to trigger leader
elections on broker 10 led partitions. From the controller log in broker 5:


2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController [kafka-scheduler-0:] - [Controller
5]: Starting preferred replica leader election for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine [kafka-scheduler-0:]
- [Partition state machine on Controller 5]: Invoking state change to OnlinePartition for
partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:]
- [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition [ClientQosCombined,375]
is not the preferred replica. Trigerring preferred replica leader election
2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController [kafka-scheduler-0:] - [Controller
5]: Partition [ClientQosCombined,375] failed to complete preferred replica leader election.
Leader is 10


I've also attached the logs and output from broker 10. Any idea what's wrong here? 


> Broker is listed as only ISR for all partitions it is leader of
> ---------------------------------------------------------------
>
>                 Key: KAFKA-6178
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6178
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.1.0
>         Environment: Windows
>            Reporter: AS
>              Labels: windows
>         Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log
>
>
> We're running a 15 broker cluster on windows machines, and one of the brokers, 10, is
the only ISR on all partitions that it is the leader of. On partitions where it isn't the
leader, it seems to follow the leadeer fine. This is an excerpt from 'describe':
> bq. Topic: ClientQosCombined      Partition: 458  Leader: 10      Replicas: 10,6,7,8,9,0,1
  Isr: 10
> bq. Topic: ClientQosCombined      Partition: 459  Leader: 11      Replicas: 11,7,8,9,0,1,10
Isr: 0,10,1,9,7,11,8
> The server.log files all seem to be pretty standard, and the only indication of this
issue is the following pattern that often repeats:
> 2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition [kafka-request-handler-8:] - Partition
[ClientQosCombined,398] on broker 10: Expanding ISR for partition [ClientQosCombined,398]
from 10 to 5,10
> 2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - Partition
[ClientQosCombined,398] on broker 10: Shrinking ISR for partition [ClientQosCombined,398]
from 5,10 to 10
> For each of the partitions that 10 leads. This is the only topic that we currently have
in our cluster. The __consumer_offsets topic seems completely normal in terms of isr counts.
The controller is broker 5, which is cycling through attempting and failing to trigger leader
elections on broker 10 led partitions. From the controller log in broker 5:
> 2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController [kafka-scheduler-0:]
- [Controller 5]: Starting preferred replica leader election for partitions [ClientQosCombined,375]
> 2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine [kafka-scheduler-0:]
- [Partition state machine on Controller 5]: Invoking state change to OnlinePartition for
partitions [ClientQosCombined,375]
> 2017-11-06 20:45:04,857 [INFO] kafka.controller.PreferredReplicaPartitionLeaderSelector
[kafka-scheduler-0:] - [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition
[ClientQosCombined,375] is not the preferred replica. Trigerring preferred replica leader
election
> 2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController [kafka-scheduler-0:]
- [Controller 5]: Partition [ClientQosCombined,375] failed to complete preferred replica leader
election. Leader is 10
> I've also attached the logs and output from broker 10. Any idea what's wrong here? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message