cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6225) GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables and reduce cache size
Date Mon, 21 Oct 2013 23:57:42 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801330#comment-13801330
] 

Alex Liu commented on CASSANDRA-6225:
-------------------------------------

You can change the configuration for the threshold to start cache or memtable flush.   
{code}
flush_largest_memtables_at = 1.0;
reduce_cache_sizes_at = 1.0;
{code}

> GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables and reduce
cache size
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6225
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6225
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 1.2.9, SunOS, Java 7
>            Reporter: Billow Gao
>
> In GCInspector.logGCResults, cassandra won't flush memtables and reduce Cache Sizes until
there is a ConcurrentMarkSweep GC. It caused a long pause on the service. And other nodes
could mark it as DEAD.
> In our stress test, we were using 64 concurrent threads to write data to cassandra. The
heap usage grew up quickly and reach to maximum.
> We saw several ConcurrentMarkSweep GCs which only freed very few rams until a memtable
flush was called. The other nodes marked the node as DOWN when GC took more than 20 seconds.
> {code}
> INFO [ScheduledTasks:1] 2013-10-18 15:42:36,176 GCInspector.java (line 119) GC for ConcurrentMarkSweep:
27481 ms for 1 collections, 5229917848 used; max is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:14,013 GCInspector.java (line 119) GC for ConcurrentMarkSweep:
27729 ms for 1 collections, 5381504752 used; max is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:50,565 GCInspector.java (line 119) GC for ConcurrentMarkSweep:
29867 ms for 1 collections, 5479631256 used; max is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:23,457 GCInspector.java (line 119) GC for ConcurrentMarkSweep:
28166 ms for 1 collections, 5545752344 used; max is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:58,290 GCInspector.java (line 119) GC for ConcurrentMarkSweep:
29377 ms for 2 collections, 5343255456 used; max is 6358564864
> {code}
> {code}
> INFO [GossipTasks:1] 2013-10-18 15:42:29,004 Gossiper.java (line 803) InetAddress /1.2.3.4
is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:43:06,901 Gossiper.java (line 803) InetAddress /1.2.3.4
is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:18,254 Gossiper.java (line 803) InetAddress /1.2.3.4
is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:48,507 Gossiper.java (line 803) InetAddress /1.2.3.4
is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:45:32,375 Gossiper.java (line 803) InetAddress /1.2.3.4
is now DOWN
> {code}
> We found two solutions to fix the long pause which result in a DOWN status.
> 1. We reduced the maximum ram to 3G. The behavior is the same, but gc was faster(under
20 seconds), so no nodes were marked as DOWN
> 2. Running a cronjob on the cassandra server which period call nodetool -h localhost
flush.
> Flush after a full gc just make thing worse and waste time spent on GC. In a heavily
load system, you would have several full GCs before a flush can finish. (a flush may take
more than 30 seconds)
> Ideally, GCInspector should has a better logic on when to flush memtable. 
> 1. Flush memtable/reduce cache size when it reached the threshold(smaller than full gc
threshold).
> 2. prevent frequently flush by remembering the last running time.
> If we call flush before a full gc, then the full gc will release those rams occupied
by memtable. Thus reduce the heap usage a lot. Otherwise, full gc will be called again and
again until a flush was finished.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message