From user-return-64567-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Oct 17 02:46:08 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BEBF3180644 for ; Thu, 17 Oct 2019 04:46:07 +0200 (CEST) Received: (qmail 17082 invoked by uid 500); 17 Oct 2019 02:46:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17072 invoked by uid 99); 17 Oct 2019 02:46:02 -0000 Received: from ui-eu-02.ponee.io (HELO localhost) (116.202.110.96) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Oct 2019 02:46:02 +0000 X-Mailer: LuaSocket 3.0-rc1 Content-Type: text/plain; charset=utf-8 Subject: Cassandra 3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine x-ponymail-agent: PonyMail Composer/0.2 Date: Thu, 17 Oct 2019 02:46:01 -0000 From: Sergio Bilello In-Reply-To: MIME-Version: 1.0 References: Message-ID: x-ponymail-sender: fbd14426bd4c58e5302921357e6bd08419fa77b7 To: Hello guys! I performed a thread dump https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTcvLS1kdW1wLnR4dC0tMC0zMC00MA==& while try to join the node with -Dcassandra.join_ring=false OR -Dcassandra.join.ring=false OR -Djoin.ring=false because the node spiked in load and latency was affecting the clients. With or without that flag the node is high in latency and I see the load sky rocketing when the number of TCP established connections increases Analyzing the /var/log/messages I am able to read Oct 17 00:23:39 prod-personalization-live-data-cassandra-08 cassandra: INFO [Service Thread] 2019-10-17 00:23:39,030 GCInspector.java:284 - G1 Young Generation GC in 255ms. G1 Eden Space: 361758720 -> 0; G1 Old Gen: 1855455944 -> 1781007048; G1 Survivor Space: 39845888 -> 32505856; Oct 17 00:23:40 prod-personalization-live-data-cassandra-08 cassandra: INFO [ScheduledTasks:1] 2019-10-17 00:23:40,352 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log) Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. Check SNMP counters. I don't see anything on debug.log that looks to be relevant The machine is on aws with 4 cpu with 32GB Ram and 1 TB SSD i3.xlarge [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 32 53 559304 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 1 107 118 0 0 MutationStage 0 0 2695 0 0 MemtableReclaimMemory 0 0 11 0 0 PendingRangeCalculator 0 0 33 0 0 GossipStage 0 0 4314 0 0 SecondaryIndexManagement 0 0 0 0 0 HintsDispatcher 0 0 0 0 0 RequestResponseStage 0 0 421865 0 0 Native-Transport-Requests 22 0 1903400 0 0 ReadRepairStage 0 0 59078 0 0 CounterMutationStage 0 0 0 0 0 MigrationStage 0 0 0 0 0 MemtablePostFlush 0 0 32 0 0 PerDiskMemtableFlushWriter_0 0 0 11 0 0 ValidationExecutor 0 0 0 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 11 0 0 InternalResponseStage 0 0 0 0 0 ViewMutationStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 CacheCleanupExecutor 0 0 0 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 HINT 0 MUTATION 0 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ top - 01:44:15 up 2 days, 1:45, 4 users, load average: 34.45, 27.71, 15.37 Tasks: 140 total, 1 running, 74 sleeping, 0 stopped, 0 zombie %Cpu(s): 90.0 us, 4.5 sy, 3.0 ni, 1.1 id, 0.0 wa, 0.0 hi, 1.4 si, 0.0 st KiB Mem : 31391772 total, 250504 free, 10880364 used, 20260904 buff/cache KiB Swap: 0 total, 0 free, 0 used. 19341960 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20712 cassand+ 20 0 194.1g 14.4g 4.6g S 392.0 48.2 74:50.48 java 20823 sergio.+ 20 0 124856 6304 3136 S 1.7 0.0 0:13.51 htop 7865 root 20 0 1062684 39880 11428 S 0.7 0.1 4:06.02 ir_agent 3557 consul 20 0 41568 30192 18832 S 0.3 0.1 13:16.37 consul 7600 root 20 0 2082700 46624 11880 S 0.3 0.1 4:14.60 ir_agent 1 root 20 0 193660 7740 5220 S 0.0 0.0 0:56.36 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 kthreadd 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq 7 root 20 0 0 0 0 S 0.0 0.0 0:06.04 ksoftirqd/0 [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ free total used free shared buff/cache available Mem: 31391772 10880916 256732 426552 20254124 19341768 Swap: 0 0 0 [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ bash-4.2$ java -jar sjk.jar ttop -p 20712 Monitoring threads ... 2019-10-17T01:45:33.352+0000 Process summary process cpu=363.58% application cpu=261.91% (user=248.65% sys=13.26%) other: cpu=101.67% thread count: 474 heap allocation rate 583mb/s [000039] user=13.56% sys=-0.59% alloc= 11mb/s - OptionalTasks:1 [000379] user= 8.57% sys=-0.27% alloc= 18mb/s - ReadStage-19 [000380] user= 7.85% sys= 0.22% alloc= 19mb/s - Native-Transport-Requests-21 [000295] user= 7.14% sys= 0.23% alloc= 14mb/s - Native-Transport-Requests-5 [000378] user= 7.14% sys=-0.03% alloc= 22mb/s - Native-Transport-Requests-17 [000514] user= 6.42% sys= 0.12% alloc= 20mb/s - Native-Transport-Requests-85 [000293] user= 6.66% sys=-0.32% alloc= 12mb/s - Native-Transport-Requests-2 [000392] user= 6.19% sys= 0.14% alloc= 9545kb/s - Native-Transport-Requests-12 [000492] user= 5.71% sys=-0.24% alloc= 15mb/s - Native-Transport-Requests-24 [000294] user= 5.23% sys=-0.25% alloc= 14mb/s - Native-Transport-Requests-3 [000381] user= 5.47% sys=-0.52% alloc= 7430kb/s - Native-Transport-Requests-23 [000672] user= 4.52% sys= 0.25% alloc= 14mb/s - Native-Transport-Requests-270 [000296] user= 5.23% sys=-0.47% alloc= 13mb/s - ReadStage-7 [000673] user= 4.52% sys= 0.05% alloc= 13mb/s - Native-Transport-Requests-269 [000118] user= 4.28% sys= 0.28% alloc= 13mb/s - Native-Transport-Requests-1 [000570] user= 4.28% sys= 0.27% alloc= 13mb/s - ReadStage-31 [000671] user= 4.28% sys= 0.25% alloc= 13mb/s - ReadStage-271 [000376] user= 4.52% sys=-0.03% alloc= 13mb/s - Native-Transport-Requests-13 [000302] user= 4.76% sys=-0.30% alloc= 13mb/s - ReadStage-8 [000377] user= 4.28% sys= 0.17% alloc= 13mb/s - ReadStage-15 ^Cbash-4.2$ dstat You did not select any stats, using -cdngy by default. ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 33 2 65 0 0 0| 443k 7988k| 0 0 | 0 0 |3255 2324 92 6 1 0 0 2|4732k 0 |1437k 1039k| 0 0 | 20k 24k 94 4 1 0 0 1|2588k 104k|1371k 545k| 0 0 | 13k 14k 91 6 1 0 0 2|1548k 0 |2045k 949k| 0 0 | 17k 20k 94 4 1 0 0 2|1120k 0 |1364k 372k| 0 0 | 12k 11k 95 4 0 0 0 1|3616k 0 |2137k 333k| 0 0 | 12k 7875 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org For additional commands, e-mail: user-help@cassandra.apache.org