From dev-return-90677-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Fri Jan 5 00:11:06 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id B228118077B for ; Fri, 5 Jan 2018 00:11:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9A094160C3A; Thu, 4 Jan 2018 23:11:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 731FA160C2B for ; Fri, 5 Jan 2018 00:11:05 +0100 (CET) Received: (qmail 59590 invoked by uid 500); 4 Jan 2018 23:11:04 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 59579 invoked by uid 99); 4 Jan 2018 23:11:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jan 2018 23:11:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B00111A012D for ; Thu, 4 Jan 2018 23:11:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id kXtfoV6htaZx for ; Thu, 4 Jan 2018 23:11:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D68C55F1B9 for ; Thu, 4 Jan 2018 23:11:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5F82CE0044 for ; Thu, 4 Jan 2018 23:11:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1C500240DE for ; Thu, 4 Jan 2018 23:11:00 +0000 (UTC) Date: Thu, 4 Jan 2018 23:11:00 +0000 (UTC) From: "Ivan Babrou (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (KAFKA-6423) Slow shutdown with many open files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Ivan Babrou created KAFKA-6423: ---------------------------------- Summary: Slow shutdown with many open files Key: KAFKA-6423 URL: https://issues.apache.org/jira/browse/KAFKA-6423 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.1 Reporter: Ivan Babrou We have brokers with 20k open files and shutdown is extremely slow, progressing usually at around 60 closed file descriptors per second: {noformat} $ while true; do echo $(date) $(sudo ls /proc/6363/fd | wc -l); sleep 1; done Thu Jan 4 23:00:51 UTC 2018 9770 Thu Jan 4 23:00:52 UTC 2018 9691 Thu Jan 4 23:00:53 UTC 2018 9616 Thu Jan 4 23:00:54 UTC 2018 9561 Thu Jan 4 23:00:55 UTC 2018 9509 Thu Jan 4 23:00:56 UTC 2018 9427 Thu Jan 4 23:00:57 UTC 2018 9350 Thu Jan 4 23:00:58 UTC 2018 9260 Thu Jan 4 23:00:59 UTC 2018 9208 {noformat} If you strace the process, you can see: {noformat} $ sudo strace -f -c -p 6363 strace: Process 6363 attached with 97 threads ^Cstrace: Process 6363 detached strace: Process 6604 detached strace: Process 6605 detached strace: Process 6606 detached strace: Process 6607 detached strace: Process 6608 detached strace: Process 6609 detached strace: Process 6610 detached strace: Process 6611 detached strace: Process 6612 detached strace: Process 6613 detached strace: Process 6614 detached strace: Process 6615 detached strace: Process 6616 detached strace: Process 6617 detached strace: Process 6618 detached strace: Process 6619 detached strace: Process 6620 detached strace: Process 6621 detached strace: Process 6622 detached strace: Process 6623 detached strace: Process 6624 detached strace: Process 6625 detached strace: Process 6626 detached strace: Process 6627 detached strace: Process 6628 detached strace: Process 6629 detached strace: Process 6630 detached strace: Process 6631 detached strace: Process 6632 detached strace: Process 6633 detached strace: Process 6634 detached strace: Process 6635 detached strace: Process 6636 detached strace: Process 6637 detached strace: Process 6638 detached strace: Process 6639 detached strace: Process 6640 detached strace: Process 6641 detached strace: Process 6642 detached strace: Process 6643 detached strace: Process 6644 detached strace: Process 6645 detached strace: Process 6646 detached strace: Process 6647 detached strace: Process 6648 detached strace: Process 6649 detached strace: Process 6650 detached strace: Process 6651 detached strace: Process 6652 detached strace: Process 6653 detached strace: Process 6654 detached strace: Process 6655 detached strace: Process 6656 detached strace: Process 6657 detached strace: Process 6658 detached strace: Process 6659 detached strace: Process 6660 detached strace: Process 6661 detached strace: Process 6662 detached strace: Process 6663 detached strace: Process 6716 detached strace: Process 6717 detached strace: Process 6718 detached strace: Process 6719 detached strace: Process 6720 detached strace: Process 6721 detached strace: Process 6722 detached strace: Process 6723 detached strace: Process 6724 detached strace: Process 6725 detached strace: Process 6726 detached strace: Process 6727 detached strace: Process 6728 detached strace: Process 6729 detached strace: Process 6730 detached strace: Process 6731 detached strace: Process 6732 detached strace: Process 6733 detached strace: Process 6734 detached strace: Process 6735 detached strace: Process 6736 detached strace: Process 6737 detached strace: Process 6738 detached strace: Process 6739 detached strace: Process 6740 detached strace: Process 6741 detached strace: Process 6760 detached strace: Process 6779 detached strace: Process 6781 detached strace: Process 6783 detached strace: Process 6892 detached strace: Process 2339 detached strace: Process 2340 detached strace: Process 5122 detached strace: Process 5123 detached strace: Process 5652 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 65.19 0.859302 35804 24 19 restart_syscall 26.60 0.350656 507 692 190 futex 5.17 0.068142 2271 30 epoll_wait 1.22 0.016141 56 287 ftruncate 0.66 0.008679 20 432 close 0.38 0.005054 35 144 fsync 0.26 0.003489 12 288 open 0.19 0.002564 4 720 fstat 0.15 0.001914 3 576 lseek 0.14 0.001885 13 144 mmap 0.01 0.000114 7 16 getrusage 0.00 0.000060 15 4 write 0.00 0.000028 5 6 read 0.00 0.000027 2 12 mprotect 0.00 0.000019 5 4 epoll_ctl ------ ----------- ----------- --------- --------- ---------------- 100.00 1.318074 3379 209 total ivan@36s116:~$ htop ivan@36s116:~$ sudo strace -f -c -p 6363 strace: Process 6363 attached with 97 threads ^Cstrace: Process 6363 detached strace: Process 6604 detached strace: Process 6605 detached strace: Process 6606 detached strace: Process 6607 detached strace: Process 6608 detached strace: Process 6609 detached strace: Process 6610 detached strace: Process 6611 detached strace: Process 6612 detached strace: Process 6613 detached strace: Process 6614 detached strace: Process 6615 detached strace: Process 6616 detached strace: Process 6617 detached strace: Process 6618 detached strace: Process 6619 detached strace: Process 6620 detached strace: Process 6621 detached strace: Process 6622 detached strace: Process 6623 detached strace: Process 6624 detached strace: Process 6625 detached strace: Process 6626 detached strace: Process 6627 detached strace: Process 6628 detached strace: Process 6629 detached strace: Process 6630 detached strace: Process 6631 detached strace: Process 6632 detached strace: Process 6633 detached strace: Process 6634 detached strace: Process 6635 detached strace: Process 6636 detached strace: Process 6637 detached strace: Process 6638 detached strace: Process 6639 detached strace: Process 6640 detached strace: Process 6641 detached strace: Process 6642 detached strace: Process 6643 detached strace: Process 6644 detached strace: Process 6645 detached strace: Process 6646 detached strace: Process 6647 detached strace: Process 6648 detached strace: Process 6649 detached strace: Process 6650 detached strace: Process 6651 detached strace: Process 6652 detached strace: Process 6653 detached strace: Process 6654 detached strace: Process 6655 detached strace: Process 6656 detached strace: Process 6657 detached strace: Process 6658 detached strace: Process 6659 detached strace: Process 6660 detached strace: Process 6661 detached strace: Process 6662 detached strace: Process 6663 detached strace: Process 6716 detached strace: Process 6717 detached strace: Process 6718 detached strace: Process 6719 detached strace: Process 6720 detached strace: Process 6721 detached strace: Process 6722 detached strace: Process 6723 detached strace: Process 6724 detached strace: Process 6725 detached strace: Process 6726 detached strace: Process 6727 detached strace: Process 6728 detached strace: Process 6729 detached strace: Process 6730 detached strace: Process 6731 detached strace: Process 6732 detached strace: Process 6733 detached strace: Process 6734 detached strace: Process 6735 detached strace: Process 6736 detached strace: Process 6737 detached strace: Process 6738 detached strace: Process 6739 detached strace: Process 6740 detached strace: Process 6741 detached strace: Process 6760 detached strace: Process 6779 detached strace: Process 6781 detached strace: Process 6783 detached strace: Process 6892 detached strace: Process 2339 detached strace: Process 2340 detached strace: Process 5122 detached strace: Process 5123 detached strace: Process 5652 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 61.71 73.863265 9074 8140 3060 futex 32.70 39.143004 65 599436 sched_yield 3.02 3.610581 16337 221 epoll_wait 1.93 2.314673 89026 26 5 restart_syscall 0.18 0.211991 211991 1 accept 0.13 0.157244 45 3492 ftruncate 0.11 0.127528 24 5285 close 0.06 0.072592 41 1760 fsync 0.04 0.047197 13 3538 open 0.03 0.036148 4 8776 fstat 0.03 0.035549 6 5715 stat 0.02 0.026905 317 85 getdents 0.02 0.026150 4 6988 2 lseek 0.02 0.025100 14 1742 mmap 0.00 0.001132 15 75 2 read 0.00 0.000662 14 46 write 0.00 0.000639 23 28 msync 0.00 0.000580 6 100 getrusage 0.00 0.000443 5 95 mprotect 0.00 0.000212 6 34 epoll_ctl 0.00 0.000099 12 8 sysinfo 0.00 0.000050 13 4 times 0.00 0.000036 9 4 getrlimit 0.00 0.000023 12 2 getsockname 0.00 0.000015 8 2 uname 0.00 0.000013 4 3 fcntl 0.00 0.000013 7 2 sched_getaffinity 0.00 0.000007 4 2 setsockopt 0.00 0.000000 0 2 writev 0.00 0.000000 0 1 dup2 0.00 0.000000 0 2 1 shutdown 0.00 0.000000 0 2 clock_gettime ------ ----------- ----------- --------- --------- ---------------- 100.00 119.701851 645617 3070 total {noformat} In jstack I can see: {noformat} "pool-6-thread-1" #219 prio=5 os_prio=0 tid=0x00007f44d80a2800 nid=0x1614 runnable [0x00007f45690a6000] java.lang.Thread.State: RUNNABLE at java.io.RandomAccessFile.close0(Native Method) at java.io.RandomAccessFile.access$000(RandomAccessFile.java:59) at java.io.RandomAccessFile$1.close(RandomAccessFile.java:619) at java.io.FileDescriptor.closeAll(FileDescriptor.java:212) - locked <0x000000064c651c10> (a java.io.FileDescriptor) at java.io.RandomAccessFile.close(RandomAccessFile.java:617) at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:139) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115) - locked <0x000000064c651be0> (a java.lang.Object) at org.apache.kafka.common.record.FileRecords.close(FileRecords.java:171) at kafka.log.LogSegment.$anonfun$close$4(LogSegment.scala:490) at kafka.log.LogSegment$$Lambda$1176/84191269.apply$mcV$sp(Unknown Source) at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:78) at kafka.utils.Logging.swallowWarn(Logging.scala:94) at kafka.utils.Logging.swallowWarn$(Logging.scala:93) at kafka.utils.CoreUtils$.swallowWarn(CoreUtils.scala:48) at kafka.utils.Logging.swallow(Logging.scala:96) at kafka.utils.Logging.swallow$(Logging.scala:96) at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:48) at kafka.log.LogSegment.close(LogSegment.scala:490) at kafka.log.Log.$anonfun$close$2(Log.scala:537) at kafka.log.Log.$anonfun$close$2$adapted(Log.scala:537) at kafka.log.Log$$Lambda$1172/1976013682.apply(Unknown Source) at scala.collection.Iterator.foreach(Iterator.scala:929) at scala.collection.Iterator.foreach$(Iterator.scala:929) at scala.collection.AbstractIterator.foreach(Iterator.scala:1417) at scala.collection.IterableLike.foreach(IterableLike.scala:71) at scala.collection.IterableLike.foreach$(IterableLike.scala:70) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.Log.close(Log.scala:537) - locked <0x0000000640488d10> (a java.lang.Object) at kafka.log.LogManager.$anonfun$shutdown$7(LogManager.scala:280) at kafka.log.LogManager$$Lambda$1166/165872442.apply$mcV$sp(Unknown Source) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} CPU is barely used at ~2%. There is some IO, but it's very far from saturating RAID0 from 12 spinning disks: https://i.imgur.com/jRuW3pp.png This happens across brokers and clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)