incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: GC taking a long time
Date Thu, 06 Feb 2014 11:34:10 GMT
for the restart issue see
CASSANDRA-6008<https://issues.apache.org/jira/browse/CASSANDRA-6008>
and
6086


On Thu, Feb 6, 2014 at 12:19 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Hi Robert,
>
> The heap, and GC are things a bit tricky to tune,
>
> I recently read a post about heap, explaining how heap works and how to
> tune it :
> http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads
>
> There is plenty of this kind of blogs or articles on the web.
>
> Be careful, tuning highly depends on your workload and hardware. You
> shouldn't use configuration found on these posts as is. You need to test,
> incrementally and monitor how it behave.
>
> If you are not able to find a solution, there is also a lot of
> professionals, consultants (like Datastax or Aaron Morton) whose job is to
> help with Cassandra integrations, including heap and GC tuning.
>
> Hope this will help somehow.
>
>
> 2014-01-29 16:51 GMT+01:00 Robert Wille <rwille@fold3.com>:
>
>  Forget about what I said about there not being any load during the
>> night. I forgot about my unit tests. They would have been running at this
>> time and they run against this cluster.
>>
>> I also forgot to provide JVM information:
>>
>> java version "1.7.0_17"
>> Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>
>> Thanks
>>
>> Robert
>>
>> From: Robert Wille <rwille@fold3.com>
>> Reply-To: <user@cassandra.apache.org>
>> Date: Wednesday, January 29, 2014 at 4:06 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: GC taking a long time
>>
>> I read through the recent thread "Cassandra mad GC", which seemed very
>> similar to my situation, but didn’t really help.
>>
>> Here is what I get from my logs when I grep for GCInspector. Note that
>> this is the middle of the night on a dev server, so there should have been
>> almost no load.
>>
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456used; max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176
>> used; max is 8126464000
>>
>> There are various long GC’s sprinkled throughout my logs. Sometimes it
>> recovers from them. Sometime it goes into a death spiral.
>>
>> About a minute prior to the big meltdown at 02:51:19 there was a lot of
>> activity on the local CF. About every 10 milliseconds the following is
>> repeated:
>>
>>  INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java
>> (line 740) Enqueuing flush of Memtable-local@1059338449(73/3672
>> serialized/live bytes, 2 ops)
>>  INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333)
>> Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops)
>>  INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373)
>> Completed flushing
>> /var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93
>> bytes) for commitlog position ReplayPosition(segmentId=1390945513179,
>> position=25932849)
>>
>> It seems strange to be repeatedly flushing a table at that frequency.
>> Other than that, I don’t see anything odd in the logs just before GC went
>> off the rails.
>>
>> I have not done any tuning. All my JVM settings are default. For the
>> record, here’s what I get from ps:
>>
>> java -ea
>> -javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms7830M -Xmx7830M
>> -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
>> -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark …
>>
>> I have a half-dozen keyspaces with a half-dozen tables per keyspace, and
>> a total of several gigabytes of data. Jut a one-node cluster. Nothing
>> really remarkable. Any tables with records that could get really wide have
>> been sharded. No record should be bigger than a few MB.
>>
>> On an unrelated note, I had to kill my server because it wouldn’t do
>> anything but GC. Now I cannot start it. It errors out with the following:
>>
>> java.lang.IllegalStateException: Unfinished compactions reference missing
>> sstables. This should never happen since compactions are marked finished
>> before we start removing the old sstables.
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
>> java.lang.IllegalStateException: Unfinished compactions reference missing
>> sstables. This should never happen since compactions are marked finished
>> before we start removing the old sstables.
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
>>         at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
>> Exception encountered during startup: Unfinished compactions reference
>> missing sstables. This should never happen since compactions are marked
>> finished before we start removing the old sstables.
>>
>> I have a single node in my cluster. Is there any way to recover, or is my
>> data toast? I can restore my data, so this isn’t a primary issue for me.
>> The GC issue, on the other hand, is preventing me from moving to production.
>>
>> Robert
>>
>>
>

Mime
View raw message