incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: GC taking a long time
Date Thu, 06 Feb 2014 11:19:19 GMT
Hi Robert,

The heap, and GC are things a bit tricky to tune,

I recently read a post about heap, explaining how heap works and how to
tune it :
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

There is plenty of this kind of blogs or articles on the web.

Be careful, tuning highly depends on your workload and hardware. You
shouldn't use configuration found on these posts as is. You need to test,
incrementally and monitor how it behave.

If you are not able to find a solution, there is also a lot of
professionals, consultants (like Datastax or Aaron Morton) whose job is to
help with Cassandra integrations, including heap and GC tuning.

Hope this will help somehow.


2014-01-29 16:51 GMT+01:00 Robert Wille <rwille@fold3.com>:

> Forget about what I said about there not being any load during the night.
> I forgot about my unit tests. They would have been running at this time and
> they run against this cluster.
>
> I also forgot to provide JVM information:
>
> java version "1.7.0_17"
> Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>
> Thanks
>
> Robert
>
> From: Robert Wille <rwille@fold3.com>
> Reply-To: <user@cassandra.apache.org>
> Date: Wednesday, January 29, 2014 at 4:06 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: GC taking a long time
>
> I read through the recent thread "Cassandra mad GC", which seemed very
> similar to my situation, but didn't really help.
>
> Here is what I get from my logs when I grep for GCInspector. Note that
> this is the middle of the night on a dev server, so there should have been
> almost no load.
>
>  INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576used; max is
> 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136used; max is
> 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400used; max is
> 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 used;
> max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344
> used; max is 8126464000
>  INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line
> 116) GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176
> used; max is 8126464000
>
> There are various long GC's sprinkled throughout my logs. Sometimes it
> recovers from them. Sometime it goes into a death spiral.
>
> About a minute prior to the big meltdown at 02:51:19 there was a lot of
> activity on the local CF. About every 10 milliseconds the following is
> repeated:
>
>  INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java
> (line 740) Enqueuing flush of Memtable-local@1059338449(73/3672
> serialized/live bytes, 2 ops)
>  INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333)
> Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373)
> Completed flushing
> /var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93
> bytes) for commitlog position ReplayPosition(segmentId=1390945513179,
> position=25932849)
>
> It seems strange to be repeatedly flushing a table at that frequency.
> Other than that, I don't see anything odd in the logs just before GC went
> off the rails.
>
> I have not done any tuning. All my JVM settings are default. For the
> record, here's what I get from ps:
>
> java -ea
> -javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms7830M -Xmx7830M
> -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
> -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark ...
>
> I have a half-dozen keyspaces with a half-dozen tables per keyspace, and a
> total of several gigabytes of data. Jut a one-node cluster. Nothing really
> remarkable. Any tables with records that could get really wide have been
> sharded. No record should be bigger than a few MB.
>
> On an unrelated note, I had to kill my server because it wouldn't do
> anything but GC. Now I cannot start it. It errors out with the following:
>
> java.lang.IllegalStateException: Unfinished compactions reference missing
> sstables. This should never happen since compactions are marked finished
> before we start removing the old sstables.
>         at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
>         at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
>         at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
>         at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
> java.lang.IllegalStateException: Unfinished compactions reference missing
> sstables. This should never happen since compactions are marked finished
> before we start removing the old sstables.
>         at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
>         at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
>         at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
>         at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
> Exception encountered during startup: Unfinished compactions reference
> missing sstables. This should never happen since compactions are marked
> finished before we start removing the old sstables.
>
> I have a single node in my cluster. Is there any way to recover, or is my
> data toast? I can restore my data, so this isn't a primary issue for me.
> The GC issue, on the other hand, is preventing me from moving to production.
>
> Robert
>
>

Mime
View raw message