Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 338E717C69 for ; Fri, 30 Oct 2015 04:29:13 +0000 (UTC) Received: (qmail 40920 invoked by uid 500); 30 Oct 2015 04:29:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40880 invoked by uid 500); 30 Oct 2015 04:29:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40864 invoked by uid 99); 30 Oct 2015 04:29:10 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Oct 2015 04:29:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CB9101809AA for ; Fri, 30 Oct 2015 04:29:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=tubularlabs_com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id evF3iHzcShom for ; Fri, 30 Oct 2015 04:28:58 +0000 (UTC) Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 5529B20F0A for ; Fri, 30 Oct 2015 04:28:57 +0000 (UTC) Received: by pasz6 with SMTP id z6so61401931pas.2 for ; Thu, 29 Oct 2015 21:28:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tubularlabs_com.20150623.gappssmtp.com; s=20150623; h=from:content-type:message-id:mime-version:subject:date:references :to:in-reply-to; bh=lzozbiMGDm1k/g4cRA+lMwo3cnfOudXqO4azGraiVKg=; b=BZIa6xUHZ7mo/D+d9eSbRlcFaq5LPkEEWMZ5ObQK38e1kVqBWwxz5UsOqh0gumOzQb faXppiuj8z4oOFD0LDBFrA3AgFr1guhJmbv1WRd+bFIllKJRfiVcCkD+hgYnXxLl5SI0 STji/X94RMX++SHrRdIpviFmx8ZjKGb8oAwSfe++unWUTZILPz/wWDIO90me8McuakKa fVsFPtCsvcdYQHi3bB4uesq6PiVrga9Q8zM0oKvZTxfYHXcSMfOsmRm4gW1wmtKBg3OU /slqqzK/la6VGyPDJ5faioW6TcpvUh42Qm5kn71mkR388H/1AilVAmxVRrG/M3OgMCIv yXgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=lzozbiMGDm1k/g4cRA+lMwo3cnfOudXqO4azGraiVKg=; b=k2yArJEPWDhQpKgoAYQ+KPAAgCDz0K1uXgWRXxct5hzMOT7GqSaiP2c5ZwGcIcZLGG O1VurWmwFdq+Y0Cqq7lKReqt8OHRvR2sMnKMo6jj/G2otXu36ORYTb7RLxdmETJrFGjS LkCtbERAdzMAadW2WMuo6k8L1Tz/X7D45jhTWNxv8J9AV3DTW61Lyxs+ka9lXwYj5YsL mn7onspbNxqwi37C0MBb+Z6oDyjSsypVLSVKWaDVSMorwoEGgBOBawROlEv/TMQ7KgDL R2D9LF74Xd0W2sD7XhInj/v2L4k/snVHbWgR/J5gItVu1Jm5q581kM0Vd5eOmsAWt+kV 0iSg== X-Gm-Message-State: ALoCoQkh/lJsICIRfh2eb45sX7WA3ZYbot1AyzUFaZm43HQ9IwjZ0K3Odducr3W0Vk1kI42dadFR X-Received: by 10.68.254.170 with SMTP id aj10mr6229293pbd.97.1446179335850; Thu, 29 Oct 2015 21:28:55 -0700 (PDT) Received: from [172.31.99.243] ([199.87.81.121]) by smtp.gmail.com with ESMTPSA id qd2sm5202399pbb.68.2015.10.29.21.28.54 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 29 Oct 2015 21:28:55 -0700 (PDT) From: Jeff Ferland Content-Type: multipart/alternative; boundary="Apple-Mail=_E2230D22-3395-42DA-B761-08408036ACAB" Message-Id: <1921BF2D-159B-489D-86B5-16826700E8A3@tubularlabs.com> Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Cassandra stalls and dropped messages not due to GC Date: Thu, 29 Oct 2015 21:28:53 -0700 References: <9864F6F7-407B-49D7-ABD5-92F1B68EBB4B@vast.com> To: user@cassandra.apache.org In-Reply-To: <9864F6F7-407B-49D7-ABD5-92F1B68EBB4B@vast.com> X-Mailer: Apple Mail (2.2104) --Apple-Mail=_E2230D22-3395-42DA-B761-08408036ACAB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Upgraded from 2.0.x. Using the other commit log sync method and 10 = seconds. Enabling batch mode is like swallowing a grenade. It=E2=80=99s starting to look to me like it=E2=80=99s possibly related = to brief IO spikes that are smaller than my usual graphing granularity. = It feels surprising to me that these would affect the Gossip threads, = but it=E2=80=99s the best current lead I have with my debugging right = now. More to come when I learn it. -Jeff > On Oct 29, 2015, at 6:36 PM, Graham Sanderson wrote: >=20 > you didn=E2=80=99t say what you upgraded from, but if it is 2.0.x, = then look at CASSANDRA-9504 >=20 > If so and you use > commitlog_sync: batch > Then you probably want to set > commitlog_sync_batch_window_in_ms: 1 (or 2) > Note I=E2=80=99m only slightly convinced this is the cause because of = your READ_REPAIR issues (though if you are dropping a lot of MUTATIONS = under load and your machines are overloaded, you=E2=80=99d be doing more = READ_REPAIR than usual probably) >=20 >> On Oct 29, 2015, at 8:12 PM, Jeff Ferland > wrote: >>=20 >> Using DSE 4.8.1 / 2.1.11.872, Java version 1.8.0_66 >>=20 >> We upgraded our cluster this weekend and have been having issues with = dropped mutations since then. Intensely investigating a single node and = toying with settings has revealed that GC stalls don=E2=80=99t make up = enough time to explain the 10 seconds of apparent stall that would cause = a hangup. >>=20 >> tpstats output typically shows active threads in the low single = digits and pending similar or 0. Before a failure, pending MutationStage = will skyrocket into 5+ digits. System.log regularly shows the gossiper = complaining, then slow log complaints, then logs dropped mutations. >>=20 >> For the entire minute of 00:55, the gc logging shows no single pause = > .14 seconds and most of them much smaller. Abbreviated GC log after = switching to g1gc (problem also exhibited before G1GC): >>=20 >> 2015-10-30T00:55:00.550+0000: 6752.857: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:02.843+0000: 6755.150: [GC pause (GCLocker Initiated = GC) (young) >> 2015-10-30T00:55:05.241+0000: 6757.548: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:07.755+0000: 6760.062: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:10.532+0000: 6762.839: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:13.080+0000: 6765.387: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:15.914+0000: 6768.221: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:18.619+0000: 6770.926: [GC pause (GCLocker Initiated = GC) (young) >> 2015-10-30T00:55:23.270+0000: 6775.578: [GC pause (GCLocker Initiated = GC) (young) >> 2015-10-30T00:55:28.662+0000: 6780.969: [GC pause (GCLocker Initiated = GC) (young) >> 2015-10-30T00:55:33.326+0000: 6785.633: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:36.600+0000: 6788.907: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:40.050+0000: 6792.357: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:43.728+0000: 6796.035: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:48.216+0000: 6800.523: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:53.621+0000: 6805.928: [GC pause (G1 Evacuation = Pause) (young) >> 2015-10-30T00:55:59.048+0000: 6811.355: [GC pause (GCLocker Initiated = GC) (young) >>=20 >> System log snippet of the pattern I=E2=80=99m seeing: >>=20 >> WARN [GossipTasks:1] 2015-10-30 00:55:25,129 Gossiper.java:747 - = Gossip stage has 1 pending tasks; skipping status check (no nodes will = be marked down) >> INFO [CompactionExecutor:210] 2015-10-30 00:55:26,006 = CompactionTask.java:141 - Compacting = [SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-82= 83-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 6-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 4-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 5-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 7-Data.db')] >> WARN [GossipTasks:1] 2015-10-30 00:55:26,230 Gossiper.java:747 - = Gossip stage has 3 pending tasks; skipping status check (no nodes will = be marked down) >> WARN [GossipTasks:1] 2015-10-30 00:55:27,330 Gossiper.java:747 - = Gossip stage has 5 pending tasks; skipping status check (no nodes will = be marked down) >> WARN [GossipTasks:1] 2015-10-30 00:55:28,431 Gossiper.java:747 - = Gossip stage has 7 pending tasks; skipping status check (no nodes will = be marked down) >> WARN [GossipTasks:1] 2015-10-30 00:55:29,531 Gossiper.java:747 - = Gossip stage has 10 pending tasks; skipping status check (no nodes will = be marked down) >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,448 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16042 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,451 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16047 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,453 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16018 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,454 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16042 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,455 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16024 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,457 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16045 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,458 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16041 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,459 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16040 = in slow log >> INFO [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,461 = CqlSlowLogWriter.java:151 - Recording statements with duration of 16037 = in slow log >> WARN [GossipTasks:1] 2015-10-30 00:55:32,697 = FailureDetector.java:257 - Not marking nodes down due to local pause of = 8668629909 > 5000000000 >> INFO [CompactionExecutor:212] 2015-10-30 00:55:32,864 = ColumnFamilyStore.java:900 - Enqueuing flush of compactions_in_progress: = 164 (0%) on-heap, 0 (0%) off-heap >> INFO [MemtableFlushWriter:846] 2015-10-30 00:55:32,865 = Memtable.java:347 - Writing = Memtable-compactions_in_progress@969357824(0.008KiB serialized bytes, 1 = ops, 0%/0% of on/off-heap limit) >> INFO [MemtableFlushWriter:846] 2015-10-30 00:55:32,865 = Memtable.java:382 - Completed flushing = /mnt/cassandra/data/system/compactions_in_progress/system-compactions_in_p= rogress-tmp-ka-1461769-Data.db (0.000KiB) for commitlog position = ReplayPosition(segmentId=3D1446159752729, position=3D8699402) >> INFO [ScheduledTasks:1] 2015-10-30 00:55:35,346 = MessagingService.java:930 - MUTATION messages were dropped in last 5000 = ms: 67432 for internal timeout and 0 for cross node timeout >> INFO [ScheduledTasks:1] 2015-10-30 00:55:35,350 = MessagingService.java:930 - READ_REPAIR messages were dropped in last = 5000 ms: 209 for internal timeout and 0 for cross node timeout >>=20 >> I=E2=80=99m starting to suspect I=E2=80=99ll have to take regular = stack dumps and catch a bad moment, but I figured I=E2=80=99d reach out = here before going down that road to see if somebody else had ever = experienced a similar kind of pain. >>=20 >> -Jeff >>=20 >=20 --Apple-Mail=_E2230D22-3395-42DA-B761-08408036ACAB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Upgraded from 2.0.x. Using the other commit = log sync method and 10 seconds. Enabling batch mode is like swallowing a = grenade.

It=E2=80= =99s starting to look to me like it=E2=80=99s possibly related to brief = IO spikes that are smaller than my usual graphing granularity. It feels = surprising to me that these would affect the Gossip threads, but it=E2=80=99= s the best current lead I have with my debugging right now. More to come = when I learn it.

-Jeff

On Oct 29, 2015, at 6:36 PM, Graham Sanderson = <graham@vast.com> = wrote:

you didn=E2=80=99= t say what you upgraded from, but if it is 2.0.x, then look at = CASSANDRA-9504

If so = and you use
commitlog_sync: batch
Then you probably want to set
commitlog_sync_batch_window_in_ms: 1 (or 2)
Note I=E2=80=99m only slightly convinced this is the cause = because of your READ_REPAIR issues (though if you are dropping a lot of = MUTATIONS under load and your machines are overloaded, you=E2=80=99d be = doing more READ_REPAIR than usual probably)
On Oct 29, 2015, at 8:12 PM, Jeff Ferland = <jbf@tubularlabs.com> wrote:

Using DSE 4.8.1 / = 2.1.11.872, Java version 1.8.0_66

We = upgraded our cluster this weekend and have been having issues with = dropped mutations since then. Intensely investigating a single node and = toying with settings has revealed that GC stalls don=E2=80=99t make up = enough time to explain the 10 seconds of apparent stall that would cause = a hangup.

tpstats output typically shows = active threads in the low single digits and pending similar or 0. Before = a failure, pending MutationStage will skyrocket into 5+ digits. = System.log regularly shows the gossiper complaining, then slow log = complaints, then logs dropped mutations.

For = the entire minute of 00:55, the gc logging shows no single pause > = .14 seconds and most of them much smaller. Abbreviated GC log after = switching to g1gc (problem also exhibited before G1GC):

2015-10-30T00:55:00.550+0000: 6752.857: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:02.843+0000: = 6755.150: [GC pause (GCLocker Initiated GC) (young)
2015-10-30T00:55:05.241+0000: 6757.548: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:07.755+0000: = 6760.062: [GC pause (G1 Evacuation Pause) (young)
2015-10-30T00:55:10.532+0000: 6762.839: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:13.080+0000: = 6765.387: [GC pause (G1 Evacuation Pause) (young)
2015-10-30T00:55:15.914+0000: 6768.221: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:18.619+0000: = 6770.926: [GC pause (GCLocker Initiated GC) (young)
2015-10-30T00:55:23.270+0000: 6775.578: [GC pause (GCLocker = Initiated GC) (young)
2015-10-30T00:55:28.662+0000: = 6780.969: [GC pause (GCLocker Initiated GC) (young)
2015-10-30T00:55:33.326+0000: 6785.633: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:36.600+0000: = 6788.907: [GC pause (G1 Evacuation Pause) (young)
2015-10-30T00:55:40.050+0000: 6792.357: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:43.728+0000: = 6796.035: [GC pause (G1 Evacuation Pause) (young)
2015-10-30T00:55:48.216+0000: 6800.523: [GC pause (G1 = Evacuation Pause) (young)
2015-10-30T00:55:53.621+0000: = 6805.928: [GC pause (G1 Evacuation Pause) (young)
2015-10-30T00:55:59.048+0000: 6811.355: [GC pause (GCLocker = Initiated GC) (young)

System log snippet of = the pattern I=E2=80=99m seeing:

WARN =  [GossipTasks:1] 2015-10-30 00:55:25,129  Gossiper.java:747 - = Gossip stage has 1 pending tasks; skipping status check (no nodes will = be marked down)
INFO  [CompactionExecutor:210] = 2015-10-30 00:55:26,006  CompactionTask.java:141 - Compacting = [SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-82= 83-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 6-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 4-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 5-Data.db'), = SSTableReader(path=3D'/mnt/cassandra/data/system/hints/system-hints-ka-828= 7-Data.db')]
WARN  [GossipTasks:1] 2015-10-30 = 00:55:26,230  Gossiper.java:747 - Gossip stage has 3 pending tasks; = skipping status check (no nodes will be marked down)
WARN =  [GossipTasks:1] 2015-10-30 00:55:27,330  Gossiper.java:747 - = Gossip stage has 5 pending tasks; skipping status check (no nodes will = be marked down)
WARN  [GossipTasks:1] 2015-10-30 = 00:55:28,431  Gossiper.java:747 - Gossip stage has 7 pending tasks; = skipping status check (no nodes will be marked down)
WARN =  [GossipTasks:1] 2015-10-30 00:55:29,531  Gossiper.java:747 - = Gossip stage has 10 pending tasks; skipping status check (no nodes will = be marked down)
INFO  [CqlSlowLog-Writer-thread-0] = 2015-10-30 00:55:32,448  CqlSlowLogWriter.java:151 - Recording = statements with duration of 16042 in slow log
INFO =  [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,451 =  CqlSlowLogWriter.java:151 - Recording statements with duration of = 16047 in slow log
INFO  [CqlSlowLog-Writer-thread-0] = 2015-10-30 00:55:32,453  CqlSlowLogWriter.java:151 - Recording = statements with duration of 16018 in slow log
INFO =  [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,454 =  CqlSlowLogWriter.java:151 - Recording statements with duration of = 16042 in slow log
INFO  [CqlSlowLog-Writer-thread-0] = 2015-10-30 00:55:32,455  CqlSlowLogWriter.java:151 - Recording = statements with duration of 16024 in slow log
INFO =  [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,457 =  CqlSlowLogWriter.java:151 - Recording statements with duration of = 16045 in slow log
INFO  [CqlSlowLog-Writer-thread-0] = 2015-10-30 00:55:32,458  CqlSlowLogWriter.java:151 - Recording = statements with duration of 16041 in slow log
INFO =  [CqlSlowLog-Writer-thread-0] 2015-10-30 00:55:32,459 =  CqlSlowLogWriter.java:151 - Recording statements with duration of = 16040 in slow log
INFO  [CqlSlowLog-Writer-thread-0] = 2015-10-30 00:55:32,461  CqlSlowLogWriter.java:151 - Recording = statements with duration of 16037 in slow log
WARN =  [GossipTasks:1] 2015-10-30 00:55:32,697 =  FailureDetector.java:257 - Not marking nodes down due to local = pause of 8668629909 > 5000000000
INFO =  [CompactionExecutor:212] 2015-10-30 00:55:32,864 =  ColumnFamilyStore.java:900 - Enqueuing flush of = compactions_in_progress: 164 (0%) on-heap, 0 (0%) off-heap
INFO  [MemtableFlushWriter:846] 2015-10-30 00:55:32,865 =  Memtable.java:347 - Writing = Memtable-compactions_in_progress@969357824(0.008KiB serialized bytes, 1 = ops, 0%/0% of on/off-heap limit)
INFO =  [MemtableFlushWriter:846] 2015-10-30 00:55:32,865 =  Memtable.java:382 - Completed flushing = /mnt/cassandra/data/system/compactions_in_progress/system-compactions_in_p= rogress-tmp-ka-1461769-Data.db (0.000KiB) for commitlog position = ReplayPosition(segmentId=3D1446159752729, position=3D8699402)
INFO  [ScheduledTasks:1] 2015-10-30 00:55:35,346 =  MessagingService.java:930 - MUTATION messages were dropped in last = 5000 ms: 67432 for internal timeout and 0 for cross node timeout
INFO  [ScheduledTasks:1] 2015-10-30 00:55:35,350 =  MessagingService.java:930 - READ_REPAIR messages were dropped in = last 5000 ms: 209 for internal timeout and 0 for cross node timeout

I=E2=80=99m starting to suspect I=E2=80=99ll = have to take regular stack dumps and catch a bad moment, but I figured = I=E2=80=99d reach out here before going down that road to see if = somebody else had ever experienced a similar kind of pain.

-Jeff



= --Apple-Mail=_E2230D22-3395-42DA-B761-08408036ACAB--