kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huxi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4368) Unclean shutdown breaks Kafka cluster
Date Thu, 03 Nov 2016 05:56:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15631715#comment-15631715
] 

huxi commented on KAFKA-4368:
-----------------------------

Could you paste the entire stack trace for both client and server ?

> Unclean shutdown breaks Kafka cluster
> -------------------------------------
>
>                 Key: KAFKA-4368
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4368
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.9.0.1, 0.10.0.0
>            Reporter: Anukool Rattana
>            Priority: Critical
>
> My team has observed that if broker process die unclean then it will block producer from
sending messages to kafka topic.
> Here is how to reproduce the problem:
> 1) Create a Kafka 0.10 with three brokers (A, B and C). 
> 2) Create topic with replication_factor = 2 
> 3) Set producer to send messages with "acks=all" meaning all replicas must be created
before able to proceed next message. 
> 4) Force IEM (IBM Endpoint Manager) to send patch to broker A and force server to reboot
after patches installed.
> Note: min.insync.replicas = 1
> Result: - Producers are not able send messages to kafka topic after broker rebooted and
come back to join cluster with following error messages. 
> [2016-09-28 09:32:41,823] WARN Error while fetching metadata with correlation id 0 :
{logstash=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> We suspected that number of replication_factor (2) is not sufficient to our kafka environment
but really need an explanation on what happen when broker facing unclean shutdown. 
> The same issue occurred when setting cluster with 2 brokers and replication_factor =
1.
> The workaround i used to recover service is to cleanup both kafka topic log file and
zookeeper data (rmr /brokers/topics/XXX and rmr /consumers/XXX).
> Note:
> Topic list after A comeback from rebooted.
> Topic:logstash  PartitionCount:3        ReplicationFactor:2     Configs:
>         Topic: logstash Partition: 0    Leader: 1       Replicas: 1,3   Isr: 1,3
>         Topic: logstash Partition: 1    Leader: 2       Replicas: 2,1   Isr: 2,1
>         Topic: logstash Partition: 2    Leader: 3       Replicas: 3,2   Isr: 2,3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message