cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Kinder <dkin...@turnitin.com>
Subject Re:
Date Fri, 29 Sep 2017 18:57:49 GMT
There are 9 non-system tables.

This machine has 64GB of memory and the memtable heap is deliberately high
to reduce compaction overhead.

So far I'm hopeful, I am seeing much higher resource from the nodes now
(closer to expected).

On Fri, Sep 29, 2017 at 11:34 AM, Jeff Jirsa <jjirsa@gmail.com> wrote:

> Aside from that a 16gb memtable is huge and probably creating a ton of
> objects in heap. How many tables are in your cluster?
>
>
> --
> Jeff Jirsa
>
>
> On Sep 29, 2017, at 11:18 AM, Dan Kinder <dkinder@turnitin.com> wrote:
>
> I have a theory I'm testing out, after looking at the thread stack. Here
> it is:
>
> Carried over from before the upgrade, I have
> memtable_heap_space_in_mb: 16384 # 16GB
> memtable_cleanup_threshold: 0.11
> memtable_allocation_type: heap_buffers
>
> But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers
> default to 2 per disk. So this would calculate a cleanup threshold of
> 1/(24+1) = 0.04 by default. Which is very different from 0.11.
>
> Perhaps something about this configuration makes it lock up trying to
> allocate memtables?
>
> On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
> jthakrar@conversantmedia.com> wrote:
>
>> This looks like a potential disk / IO issue.
>>
>>
>>
>> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>>
>>
>>
>>
>>
>> *From: *Dan Kinder <dkinder@turnitin.com>
>> *Date: *Friday, September 29, 2017 at 11:47 AM
>> *To: *<user@cassandra.apache.org>
>> *Subject: *Re:
>>
>>
>>
>> Attached a jstack thread dump from a node that's having trouble (see
>> tpstats paste as well)
>>
>>
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         409111
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    10        10           2481
>>   0                 0
>>
>> MutationStage                         32   1572402       31515477
>>   0                 0
>>
>> GossipStage                            1      1620         180091
>>   0                 0
>>
>> RequestResponseStage                   0         0       17509006
>>   0                 0
>>
>> ReadRepairStage                        0         0           4833
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        44             74
>>   0                 0
>>
>> MemtablePostFlush                      1       281            158
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             67
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             79
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             19
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        1         1           1456
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             67
>>   0                 0
>>
>> Native-Transport-Requests             69         0       17951984
>>   0             26956
>>
>> PerDiskMemtableFlushWriter_2           0         0             67
>>   0                 0
>>
>> MigrationStage                         1       201           1048
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             78
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             67
>>   0                 0
>>
>> InternalResponseStage                  0         0        2754163
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             67
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             67
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dkinder@turnitin.com> wrote:
>>
>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
>> very long... Attached the full log as well since node restart.
>>
>>
>>
>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>>
>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>>
>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>
>>
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>> The digest mismatch exception is not a problem, that's why it's only
>> logged at debug.
>>
>>
>>
>> As Thomas noted, there's a pretty good chance this is
>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
>> of GCInspector logs indicating GC pauses, that would add confidence to that
>> diagnosis.
>>
>>
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dkinder@turnitin.com> wrote:
>>
>> Thanks for the responses.
>>
>>
>>
>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>> run upgradesstables yet.
>>
>>
>>
>> @Thomas no I don't see any major GC going on.
>>
>>
>>
>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>> and bring it back (thankfully this cluster is not serving live traffic).
>> The nodes seemed okay for an hour or two, but I see the issue again,
>> without me bouncing any nodes. This time it's ReadStage that's building up,
>> and the exception I'm seeing in the logs is:
>>
>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
>> - Digest mismatch:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>
>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_71]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_71]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>
>>
>>
>> Do you think running upgradesstables would help? Or relocatesstables? I
>> presumed it shouldn't be necessary for Cassandra to function, just an
>> optimization.
>>
>>
>>
>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>> thomas.steinmaurer@dynatrace.com> wrote:
>>
>> Dan,
>>
>>
>>
>> do you see any major GC? We have been hit by the following memory leak in
>> our loadtest environment with 3.11.0.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>
>>
>>
>> So, depending on the heap size and uptime, you might get into heap
>> troubles.
>>
>>
>>
>> Thomas
>>
>>
>>
>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>> *Sent:* Donnerstag, 28. September 2017 18:20
>> *To:* user@cassandra.apache.org
>> *Subject:*
>>
>>
>>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> -dan
>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Mime
View raw message