cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Kinder <dkin...@turnitin.com>
Subject Re:
Date Fri, 29 Sep 2017 18:18:01 GMT
I have a theory I'm testing out, after looking at the thread stack. Here it
is:

Carried over from before the upgrade, I have
memtable_heap_space_in_mb: 16384 # 16GB
memtable_cleanup_threshold: 0.11
memtable_allocation_type: heap_buffers

But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers
default to 2 per disk. So this would calculate a cleanup threshold of
1/(24+1) = 0.04 by default. Which is very different from 0.11.

Perhaps something about this configuration makes it lock up trying to
allocate memtables?

On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
jthakrar@conversantmedia.com> wrote:

> This looks like a potential disk / IO issue.
>
>
>
> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>
>
>
>
>
> *From: *Dan Kinder <dkinder@turnitin.com>
> *Date: *Friday, September 29, 2017 at 11:47 AM
> *To: *<user@cassandra.apache.org>
> *Subject: *Re:
>
>
>
> Attached a jstack thread dump from a node that's having trouble (see
> tpstats paste as well)
>
>
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         409111         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    10        10           2481         0
>                 0
>
> MutationStage                         32   1572402       31515477         0
>                 0
>
> GossipStage                            1      1620         180091         0
>                 0
>
> RequestResponseStage                   0         0       17509006         0
>                 0
>
> ReadRepairStage                        0         0           4833         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        44             74         0
>                 0
>
> MemtablePostFlush                      1       281            158         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             67         0
>                 0
>
> MemtableReclaimMemory                  0         0             79         0
>                 0
>
> PendingRangeCalculator                 0         0             19         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        1         1           1456         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             67         0
>                 0
>
> Native-Transport-Requests             69         0       17951984         0
>             26956
>
> PerDiskMemtableFlushWriter_2           0         0             67         0
>                 0
>
> MigrationStage                         1       201           1048         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             78         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             67         0
>                 0
>
> InternalResponseStage                  0         0        2754163         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             67         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             67         0
>                 0
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dkinder@turnitin.com> wrote:
>
> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
> very long... Attached the full log as well since node restart.
>
>
>
> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>
> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>
> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>
>
>
>
>
> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>
> The digest mismatch exception is not a problem, that's why it's only
> logged at debug.
>
>
>
> As Thomas noted, there's a pretty good chance this is
> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
> of GCInspector logs indicating GC pauses, that would add confidence to that
> diagnosis.
>
>
>
>
>
> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dkinder@turnitin.com> wrote:
>
> Thanks for the responses.
>
>
>
> @Prem yes this is after the entire cluster is on 3.11, but no I did not
> run upgradesstables yet.
>
>
>
> @Thomas no I don't see any major GC going on.
>
>
>
> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and
> bring it back (thankfully this cluster is not serving live traffic). The
> nodes seemed okay for an hour or two, but I see the issue again, without me
> bouncing any nodes. This time it's ReadStage that's building up, and the
> exception I'm seeing in the logs is:
>
> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
> - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>
>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_71]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_71]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
> threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>
>
>
> Do you think running upgradesstables would help? Or relocatesstables? I
> presumed it shouldn't be necessary for Cassandra to function, just an
> optimization.
>
>
>
> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
> Dan,
>
>
>
> do you see any major GC? We have been hit by the following memory leak in
> our loadtest environment with 3.11.0.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13754
>
>
>
> So, depending on the heap size and uptime, you might get into heap
> troubles.
>
>
>
> Thomas
>
>
>
> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
> *Sent:* Donnerstag, 28. September 2017 18:20
> *To:* user@cassandra.apache.org
> *Subject:*
>
>
>
> Hi,
>
> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
> following. The cluster does function, for a while, but then some stages
> begin to back up and the node does not recover and does not drain the
> tasks, even under no load. This happens both to MutationStage and
> GossipStage.
>
> I do see the following exception happen in the logs:
>
>
>
> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
> CassandraDaemon.java:228 - Exception in thread
> Thread[ReadRepairStage:2328,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 1 responses.
>
>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
> ors$2.close(UnfilteredPartitionIterators.java:182)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_91]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_91]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
> threadLocalDeallocator$0(NamedThreadFactory.java:81)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>
>
>
> But it's hard to correlate precisely with things going bad. It is also
> very strange to me since I have both read_repair_chance and
> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
> confusing why ReadRepairStage would err.
>
> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
> If I can't find a resolution I'm going to need to downgrade and restore to
> backup...
>
> The only issue I found that looked similar is https://issues.apache.org/j
> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>
>
>
> $ nodetool tpstats
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         582103         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    11        11           2868         0
>                 0
>
> MutationStage                         32   4593678       55057393         0
>                 0
>
> GossipStage                            1      2818         371487         0
>                 0
>
> RequestResponseStage                   0         0        4345522         0
>                 0
>
> ReadRepairStage                        0         0         151473         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        81             76         0
>                 0
>
> MemtablePostFlush                      1       382            139         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             69         0
>                 0
>
> MemtableReclaimMemory                  0         0             81         0
>                 0
>
> PendingRangeCalculator                 0         0             32         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        0         0            596         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             69         0
>                 0
>
> Native-Transport-Requests             11         0        4547746
> 0                67
>
> PerDiskMemtableFlushWriter_2           0         0             69         0
>                 0
>
> MigrationStage                         1      1545            586         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             80         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             69         0
>                 0
>
> InternalResponseStage                  0         0          45432         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             69         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             69         0
>                 0
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
> -dan
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>
>
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Mime
View raw message