Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ACEA0200CF6 for ; Mon, 18 Sep 2017 23:46:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AB58A1609DE; Mon, 18 Sep 2017 21:46:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C93671609D4 for ; Mon, 18 Sep 2017 23:46:09 +0200 (CEST) Received: (qmail 62289 invoked by uid 500); 18 Sep 2017 21:46:08 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 62273 invoked by uid 99); 18 Sep 2017 21:46:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Sep 2017 21:46:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 69221185374 for ; Mon, 18 Sep 2017 21:46:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id g1tswA_nmqwT for ; Mon, 18 Sep 2017 21:46:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D02585FE5F for ; Mon, 18 Sep 2017 21:46:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A4AAE0EF4 for ; Mon, 18 Sep 2017 21:46:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E94282415D for ; Mon, 18 Sep 2017 21:46:01 +0000 (UTC) Date: Mon, 18 Sep 2017 21:46:01 +0000 (UTC) From: "Jason Brown (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 18 Sep 2017 21:46:10 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D16170780#comment-16170780 ]=20 Jason Brown commented on CASSANDRA-13651: ----------------------------------------- Cool. Thanks for the patch. The code itself is trivial (adding a configurat= ion property), however I liked what you had before:=20 {code} if (FLUSH_DELAY > 0) eventLoop.schedule(this, FLUSH_DELAY, TimeUnit.NANOSECONDS); else eventLoop.execute(this); {code} That way if the {{FLUSH_DELAY}} is 0, skip creating a scheduled task in net= ty and just submit it for execution. wdyt? I'd like to run this in my test environment to verify if there's any change= on my side. It may take a few days due to other on-going work. > Large amount of CPU used by epoll_wait(.., .., .., 0) > ----------------------------------------------------- > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1365= 1 > Project: Cassandra > Issue Type: Bug > Reporter: Corentin Chary > Assignee: Corentin Chary > Fix For: 4.x > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing thi= s backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java= (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) Native= .java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.j= ava:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() SingleThreadEv= entExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.ru= n() DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native co= de properly, but I wen't further and I realized that most of the CPU was us= ed by {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of th= e overhead was with timeout =3D=3D 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (appro= x.): 11594448 > Overhead Trace output = = =E2=97=86 > 90.06% epfd: 0x00000047, events: 0x7f5588c0c000, maxevents: 0x00002000= , timeout: 0x00000000 = =E2=96=92 > 5.77% epfd: 0x000000b5, events: 0x7fca419ef000, maxevents: 0x00001000= , timeout: 0x00000000 = =E2=96=92 > 1.98% epfd: 0x000000b5, events: 0x7fca419ef000, maxevents: 0x00001000= , timeout: 0x000003e8 = =E2=96=92 > 0.04% epfd: 0x00000003, events: 0x2f6af77b9c00, maxevents: 0x00000020= , timeout: 0x00000000 = =E2=96=92 > 0.04% epfd: 0x0000002b, events: 0x121ebf63ac00, maxevents: 0x00000040= , timeout: 0x00000000 = =E2=96=92 > 0.03% epfd: 0x00000026, events: 0x7f51f80019c0, maxevents: 0x00000020= , timeout: 0x00000000 = =E2=96=92 > 0.02% epfd: 0x00000003, events: 0x7fe4d80019d0, maxevents: 0x00000020= , timeout: 0x00000000 > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output = =20 > # ........ ........ ........ ........ ...............................= ..................................................... > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x000000a7, events: 0x7fc= a452d6000, maxevents: 0x00001000, timeout: 0x00000000 > | > ---0x1000200af313 > | =20 > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x000000a7, events: 0x7fc= a452d6000, maxevents: 0x00001000, timeout: 0x000003e8 > | > ---0x1000200af313 > 0x7fca6117b830 > 0x7fca60459804 > epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pr= ef reports a per-CPU percentage or a per-system percentage, but that would = be still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of {{Mes= sage::Flusher}} with a deadline of 10 usec (5 per messages I think) but net= ty+epoll only support timeouts above the milliseconds and will convert ever= ything bellow to 0. > I added some traces to netty (4.1): > {code} > diff --git a/transport-native-epoll/src/main/java/io/netty/channel/epoll/= EpollEventLoop.java b/transport-native-epoll/src/main/java/io/netty/channel= /epoll/EpollEventLoop.java > index 909088fde..8734bbfd4 100644 > --- a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEv= entLoop.java > +++ b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEv= entLoop.java > @@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEven= tLoop { > long currentTimeNanos =3D System.nanoTime(); > long selectDeadLineNanos =3D currentTimeNanos + delayNanos(curre= ntTimeNanos); > for (;;) { > - long timeoutMillis =3D (selectDeadLineNanos - currentTimeNan= os + 500000L) / 1000000L; > + long timeoutNanos =3D selectDeadLineNanos - currentTimeNanos= + 500000L; > + long timeoutMillis =3D timeoutNanos / 1000000L; > + System.out.printf("timeoutNanos: %d, timeoutMillis: %d | dea= dline: %d - now: %d | hastask: %d\n", > + timeoutNanos, timeoutMillis, > + selectDeadLineNanos, currentTimeNanos, hasTasks() ? = 1 : 0); > if (timeoutMillis <=3D 0) { > if (selectCnt =3D=3D 0) { > int ready =3D Native.epollWait(epollFd.intValue(), e= vents, 0); > + System.out.printf("ready: %d\n", ready); > if (ready > 0) { > return ready; > } > {code} > And this gives : > {code} > timeoutNanos: 1000500000, timeoutMillis: 1000 | deadline: 200178234181651= 0 - now: 2001781341816510 | hastask: 0 > timeoutNanos: 1000500000, timeoutMillis: 1000 | deadline: 200178234208723= 9 - now: 2001781342087239 | hastask: 0 > timeoutNanos: 1000500000, timeoutMillis: 1000 | deadline: 200178234216694= 7 - now: 2001781342166947 | hastask: 0 > timeoutNanos: 508459, timeoutMillis: 0 | deadline: 2001781342297987 - now= : 2001781342289528 | hastask: 0 > ready: 0 > timeoutNanos: 508475, timeoutMillis: 0 | deadline: 2001781342357719 - now= : 2001781342349244 | hastask: 0 > ready: 0 > timeoutNanos: 509327, timeoutMillis: 0 | deadline: 2001781342394822 - now= : 2001781342385495 | hastask: 0 > ready: 0 > timeoutNanos: 509339, timeoutMillis: 0 | deadline: 2001781342430192 - now= : 2001781342420853 | hastask: 0 > ready: 0 > timeoutNanos: 509510, timeoutMillis: 0 | deadline: 2001781342461588 - now= : 2001781342452078 | hastask: 0 > ready: 0 > timeoutNanos: 509493, timeoutMillis: 0 | deadline: 2001781342495044 - now= : 2001781342485551 | hastask: 0 > ready: 0 > {code} > The nanosecond timeout all come from {{eventLoop.schedule(this, 10000, Ti= meUnit.NANOSECONDS);}} in {{Message::Flusher}}. > Knowing that, I'm not sure what would be best to do, and I have a hard ti= me understanding Message::Flusher, but to me it looks like trying to schedu= le less tasks would probably help and I didn't think anything obvious that = could be done with netty. > Changing {{if (++runsWithNoWork > 5)}} to 2 seems to help a little bit, b= ut that isn't really significant. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org