kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Dunayevsky (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-6343) OOM as the result of creation of 5k topics
Date Mon, 11 Dec 2017 12:26:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285834#comment-16285834
] 

Alex Dunayevsky edited comment on KAFKA-6343 at 12/11/17 12:25 PM:
-------------------------------------------------------------------

Ismael Juma, we have just reproduced the issue once again while keeping track of open file
handles. Here are the results:

{code:java}
$ while true; do cat /proc/sys/fs/file-nr; sleep 1; done

3024   0  300000      <--- starting topic creation 
...                               
66192  0  300000      <--- all 5k topics created
...                   <--- broker continues topic loading
98560  0  300000      <--- breaks here, this is where broker dies
1568   0  300000      <--- after broker death

Where: the first column stands for "open file handles" and the last column (300000) stands
for total file handles available in the system. 
{code}



was (Author: alex.dunayevsky):
Ismael Juma, we have just reproduced the issue once again while keeping track of open file
handles. Here are the results:

{code:java}
$ while true; do cat /proc/sys/fs/file-nr; sleep 1; done
3024   0  300000      <--- starting topic creation 
...                               
66192  0  300000      <--- all 5k topics created
...                   <--- broker continues topic loading
98560  0  300000      <--- breaks here, this is where broker dies
1568   0  300000      <--- after broker death
{code}


> OOM as the result of creation of 5k topics
> ------------------------------------------
>
>                 Key: KAFKA-6343
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6343
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.1
>         Environment: RHEL 7, RAM 755GB per host
>            Reporter: Alex Dunayevsky
>
> *Reproducing*: Create 5k topics *from the code* quickly, without any delays. Wait until
brokers will finish loading them. This will actually never happen, since all brokers will
go down one by one after approx 10-15 minutes or more, depending on the hardware.
> *Heap*: -Xmx/Xms: 5G, 10G, 50G, 256G
>  
> *Topology*: 3 brokers, 3 zk.
> *Code for 5k topic creation:*
> {code:java}
> package kafka
> import kafka.admin.AdminUtils
> import kafka.utils.{Logging, ZkUtils}
> object TestCreateTopics extends App with Logging {
>   val zkConnect = "grid978:2185"
>   var zkUtils = ZkUtils(zkConnect, 6000, 6000, isZkSecurityEnabled = false)
>   for (topic <- 1 to 5000) {
>     AdminUtils.createTopic(
>       topic             = s"${topic.toString}",
>       partitions        = 10,
>       replicationFactor = 2,
>       zkUtils           = zkUtils
>     )
>     logger.info(s"Created topic ${topic.toString}")
>   }
> }
> {code}
> *Cause of death:*
> {code:java}
>     java.io.IOException: Map failed
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:920)
>         at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
>         at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
>         at kafka.log.LogSegment.<init>(LogSegment.scala:67)
>         at kafka.log.Log.loadSegments(Log.scala:255)
>         at kafka.log.Log.<init>(Log.scala:108)
>         at kafka.log.LogManager.createLog(LogManager.scala:362)
>         at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>         at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>         at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>         at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>         at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>         at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>         at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
>         at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
>         at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Map failed
>         at sun.nio.ch.FileChannelImpl.map0(Native Method)
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:917)
>         ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message