Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@kafka.apache.org
Date: Mon, 11 Dec 2017 14:24:00 +0000 (UTC)
From: "Alex Dunayevsky (JIRA)" <jira@apache.org>
To: jira@kafka.apache.org
Message-ID: <JIRA.13124109.1512991964000.430132.1513002240993@Atlassian.JIRA>
In-Reply-To: <JIRA.13124109.1512991964000@Atlassian.JIRA>
References: <JIRA.13124109.1512991964000@Atlassian.JIRA> <JIRA.13124109.1512991964834@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (KAFKA-6343) OOM as the result of creation
 of 5k topics
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 11 Dec 2017 14:24:06 -0000


    [ https://issues.apache.org/jira/browse/KAFKA-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285968#comment-16285968 ] 

Alex Dunayevsky edited comment on KAFKA-6343 at 12/11/17 2:23 PM:
------------------------------------------------------------------

Ismael Juma, once again, thank you! This time it looks like the core problem.

Reproducing: 
{code:java}

// Max number memory map operations is:
$ /sbin/sysctl vm.max_map_count
vm.max_map_count = 65530

// Tracking vm map size:
$ cat /proc/<KAFKA_PID>/maps | wc -l
898 <--- grows from this value
...
65532 <--- up to this value (it's even a bit larger than m.max_map=65530). This is the point where broker fails... So you are right!

// Then all we have to do is to increase vm.max_map size to a larger value (ex., by 65536 * 4):
$ /sbin/sysctl -w vm.max_map=262144

{code}

Ismael, awesome job!


was (Author: alex.dunayevsky):
Ismael Juma, once again, thank you! This time it looks like the core problem.

*Reproducing: *
{code:java}

// vm.max_map_count is:
$ /sbin/sysctl vm.max_map_count
vm.max_map_count = 65530

// Tracking vm map size:
$ cat /proc/<KAFKA_PID>/maps | wc -l
898 <--- grows from this value
...
65532 <--- up to this value (it's even a bit larger than m.max_map=65530). This is the point where broker fails... So you are right!

Then all we have to do is to increase vm.max_map size to a larger value (ex., by 65536 * 4):
$ /sbin/sysctl -w vm.max_map=262144
{code}

Ismael, awesome job!

> OOM as the result of creation of 5k topics
> ------------------------------------------
>
>                 Key: KAFKA-6343
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6343
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.1
>         Environment: RHEL 7, RAM 755GB per host
>            Reporter: Alex Dunayevsky
>
> *Reproducing*: Create 5k topics *from the code* quickly, without any delays. Wait until brokers will finish loading them. This will actually never happen, since all brokers will go down one by one after approx 10-15 minutes or more, depending on the hardware.
> *Heap*: -Xmx/Xms: 5G, 10G, 50G, 256G
>  
> *Topology*: 3 brokers, 3 zk.
> *Code for 5k topic creation:*
> {code:java}
> package kafka
> import kafka.admin.AdminUtils
> import kafka.utils.{Logging, ZkUtils}
> object TestCreateTopics extends App with Logging {
>   val zkConnect = "grid978:2185"
>   var zkUtils = ZkUtils(zkConnect, 6000, 6000, isZkSecurityEnabled = false)
>   for (topic <- 1 to 5000) {
>     AdminUtils.createTopic(
>       topic             = s"${topic.toString}",
>       partitions        = 10,
>       replicationFactor = 2,
>       zkUtils           = zkUtils
>     )
>     logger.info(s"Created topic ${topic.toString}")
>   }
> }
> {code}
> *Cause of death:*
> {code:java}
>     java.io.IOException: Map failed
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:920)
>         at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
>         at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
>         at kafka.log.LogSegment.<init>(LogSegment.scala:67)
>         at kafka.log.Log.loadSegments(Log.scala:255)
>         at kafka.log.Log.<init>(Log.scala:108)
>         at kafka.log.LogManager.createLog(LogManager.scala:362)
>         at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>         at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>         at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>         at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>         at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>         at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>         at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
>         at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
>         at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Map failed
>         at sun.nio.ch.FileChannelImpl.map0(Native Method)
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:917)
>         ... 28 more
> {code}
> Broker restart results the same OOM issues. All brokers will not be able to start again. 


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)