kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (KAFKA-6064) Cluster hung when the controller tried to delete a bunch of topics
Date Sat, 28 Oct 2017 14:58:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ismael Juma resolved KAFKA-6064.
--------------------------------
    Resolution: Auto Closed

0.8.2.1 is no longer supported. Many bugs related to topic deletion have been fixed in the
releases since then. I suggest upgrading.

> Cluster hung when the controller tried to delete a bunch of topics 
> -------------------------------------------------------------------
>
>                 Key: KAFKA-6064
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6064
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.2.1
>         Environment: rhel 6, 12 core, 48GB 
>            Reporter: Chaitanya GSK
>              Labels: controller, kafka-0.8
>
> Hi, 
> We have been using 0.8.2.1 in our kafka cluster and we had a full cluster outage when
we programmatically tried to delete 220 topics and the controller got hung and went out of
memory. This has somehow led to the whole cluster outage and the clients were not able to
push the data at the right rate. AFAIK, controller shouldn't impact the write rate to the
fellow brokers and in this case, it did. Below is the client error.
> [WARN] Failed to send kafka.producer.async request with correlation id 1613935688 to
broker 44 with data for partitions [topic_2,65],[topic_2,167],[topic_3,2],[topic_4,0],[topic_5,30],[topic_2,48],[topic_2,150]
> java.io.IOException: Broken pipe
> 	at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_60]
> 	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_60]
> 	at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_60]
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_60]
> 	at java.nio.channels.SocketChannel.write(SocketChannel.java:502) ~[?:1.8.0_60]
> 	at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) ~[stormjar.jar:?]
> 	at kafka.network.Send$class.writeCompletely(Transmission.scala:75) ~[stormjar.jar:?]
> 	at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
~[stormjar.jar:?]
> 	at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) ~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) ~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:103)
~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103)
~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103)
~[stormjar.jar:?]
> 	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:102)
~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
> 	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
> 	at kafka.producer.SyncProducer.send(SyncProducer.scala:101) ~[stormjar.jar:?]
> 	at kafka.producer.async.YamasKafkaEventHandler.kafka$producer$async$YamasKafkaEventHandler$$send(YamasKafkaEventHandler.scala:481)
[stormjar.jar:?]
> 	at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:144)
[stormjar.jar:?]
> 	at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:138)
[stormjar.jar:?]
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
[stormjar.jar:?]
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) [stormjar.jar:?]
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) [stormjar.jar:?]
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) [stormjar.jar:?]
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) [stormjar.jar:?]
> 	at kafka.producer.async.YamasKafkaEventHandler.dispatchSerializedData(YamasKafkaEventHandler.scala:138)
[stormjar.jar:?]
> 	at kafka.producer.async.YamasKafkaEventHandler.handle(YamasKafkaEventHandler.scala:79)
[stormjar.jar:?]
> 	at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
[stormjar.jar:?]
> 	at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
[stormjar.jar:?]
> 	at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
[stormjar.jar:?]
> 	at scala.collection.immutable.Stream.foreach(Stream.scala:547) [stormjar.jar:?]
> 	at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
[stormjar.jar:?]
> 	at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) [stormjar.jar:?]
> We tried shifting the controller to a different broker and that didn't help. We had to
ultimately clean up the kafka cluster to stabilize it. 
> Wondering if this is a known issue and if not we would appreciate it if anyone in the
community could provide insights into why the hung controller would bring down the cluster
and why deleting the topics would cause the controllers hang.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message