kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Eisele (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6714) KafkaController marks all Brokers as "Shutting down", though only one broker has been shut down
Date Mon, 26 Mar 2018 12:02:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Eisele updated KAFKA-6714:
------------------------------
    Environment: Kafka cluster on Amazon AWS EC2 r4.2xlarge instances with 5 nodes and a Zookeeper
cluster on r4.2xlarge instances with 3 nodes. The cluster is distributed across 2 availability
zones.  (was: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances with 5 nodes and a Zookeeper
Cluster on r4.2xlarge instances with 3 nodes. The Cluster is distributed across 2 availability
zones.)

> KafkaController marks all Brokers as "Shutting down", though only one broker has been
shut down
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6714
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6714
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, core
>    Affects Versions: 0.11.0.2
>         Environment: Kafka cluster on Amazon AWS EC2 r4.2xlarge instances with 5 nodes
and a Zookeeper cluster on r4.2xlarge instances with 3 nodes. The cluster is distributed across
2 availability zones.
>            Reporter: Uwe Eisele
>            Priority: Critical
>
> In our Kafka Cluster we experienced a situation in wich the Kafka controller has all
Brokers marked as "Shutting down", though indeed only one Broker has been shut down.
> The last log entry about the broker state before the entry that states that all brokers
are shutting down states that no brokers are shutting down.
> The consequence of this weird state is, that the Kafka controller is not able to elect
any partition leader.
> {code:java}
> [2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5 (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5 (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4 (kafka.controller.KafkaController)
> ...
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in the cluster:
Set(1, 2, 3, 4) (kafka.controller.KafkaController)
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in the cluster:
Set() (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1 (kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers: 1,5,2,3,4
(kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers:  (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while electing
leader for partition [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No
other replicas in ISR 1,3,5 for [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6]
besides shutting down brokers 1,5,2,3,4. (state.change.logger) {code}
> The question is why the Kafka controller assumes that all brokers are shutting down?
> The only place in the Kafka code (0.11.0.2) we found in which the shutting down broker
set is changed is in the class _kafka.controller.KafkaControler_ in line 1407 in the method
_doControlledShutdown_.
>  
> {code:java}
> info("Shutting down broker " + id)
> if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id))
>   throw new BrokerNotAvailableException("Broker id %d does not exist.".format(id))
> controllerContext.shuttingDownBrokerIds.add(id)
> {code}
> However, we should see the log entry "Shutting down broker n" for all Brokers in the
log file, but it is not there.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message