cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-8667) ConcurrentMarkSweep loop
Date Mon, 24 Aug 2015 16:01:46 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-8667.
---------------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s:     (was: 2.0.x)

> ConcurrentMarkSweep loop 
> -------------------------
>
>                 Key: CASSANDRA-8667
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8667
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: dse 4.5.4 (cassandra 2.0.11.82), aws i2.x2large nodes
>            Reporter: Gil Ganz
>         Attachments: cassandra-env.sh, cassandra.yaml
>
>
> hey
> we are having an issue with nodes that for some reason get into a full gc loop and never
recover. can happen in any node from time to time, but recently we have a node (which was
added to the cluster 2 days) ago that gets this every time.
> scenario is like this:
> almost no writes/reads going to cluster (<500 reads or writes per second), node is
up for 10-20 minutes, doing compactions of big column families and then full gc starts to
kick in, doing loops of 60sec cms gc, even if the heap is not full and the compaction becomes
really slow, node starts to look  down to other nodes.
> from system.log :
> INFO [ScheduledTasks:1] 2015-01-21 23:02:29,552 GCInspector.java (line 116) GC for ConcurrentMarkSweep:
36444 ms for 1 collections, 6933307656 used; max is 10317987840
> from gc.log.0:
> 2015-01-21T23:01:53.072-0800: 1541.643: [CMS2015-01-21T23:01:56.440-0800: 1545.011: [CMS-concurrent-mark:
13.914/13.951 secs] [Times: user=62.39 sys=7.05, real=13.95 secs]
>  (concurrent mode failure)CMS: Large block 0x0000000000000000
> : 6389749K->6389759K(6389760K), 36.1323980 secs] 10076149K->6685617K(10076160K),
[CMS Perm : 28719K->28719K(47840K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 24576
> Max   Chunk Size: 24576
> Number of Blocks: 1
> Av.  Block  Size: 24576
> Tree      Height: 1
> , 36.1327700 secs] [Times: user=40.90 sys=0.00, real=36.14 secs]
> Heap after GC invocations=236 (full 19):
>  par new generation   total 3686400K, used 295857K [0x000000057ae00000, 0x0000000674e00000,
0x0000000674e00000)
>   eden space 3276800K,   9% used [0x000000057ae00000, 0x000000058ceec4c0, 0x0000000642e00000)
>   from space 409600K,   0% used [0x000000065be00000, 0x000000065be00000, 0x0000000674e00000)
>   to   space 409600K,   0% used [0x0000000642e00000, 0x0000000642e00000, 0x000000065be00000)
>  concurrent mark-sweep generation total 6389760K, used 6389759K [0x0000000674e00000,
0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 48032K, used 28719K [0x00000007fae00000, 0x00000007fdce8000,
0x0000000800000000)
> }
> 2015-01-21T23:02:29.204-0800: 1577.776: Total time for which application threads were
stopped: 36.1334050 seconds
> 2015-01-21T23:02:29.239-0800: 1577.810: Total time for which application threads were
stopped: 0.0060230 seconds
> 2015-01-21T23:02:29.239-0800: 1577.811: [GC [1 CMS-initial-mark: 6389759K(6389760K)]
6769792K(10076160K), 0.3112760 secs] [Times: user=0.00 sys=0.00, real=0.31 secs]
> 2015-01-21T23:02:29.551-0800: 1578.122: Total time for which application threads were
stopped: 0.3118580 seconds
> 2015-01-21T23:02:29.551-0800: 1578.122: [CMS-concurrent-mark-start]
> 2015-01-21T23:02:29.635-0800: 1578.206: Total time for which application threads were
stopped: 0.0060250 seconds
> machines are i2.x2large (8 cores, 60gb ram), datadir is on ssd ephemeral, heap size 10g
newgen 4gb (following dse recommendation to solve another issue with many parnew gc's going
on)
> 2 dc cluster, 8 nodes in west, 17 nodes in the east (main dc), read heavy (15k writes
per second, at least that much reads per second right now due to the problems but was high
as 35k reads per second in the past).
> attached yaml and env file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message