Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 30542106D6 for ; Thu, 6 Feb 2014 11:34:44 +0000 (UTC) Received: (qmail 95254 invoked by uid 500); 6 Feb 2014 11:34:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95234 invoked by uid 500); 6 Feb 2014 11:34:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95224 invoked by uid 99); 6 Feb 2014 11:34:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 11:34:39 +0000 X-ASF-Spam-Status: No, hits=-6.5 required=5.0 tests=ENV_AND_HDR_SPF_MATCH,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,USER_IN_DEF_SPF_WL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael.laing@nytimes.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 11:34:33 +0000 Received: by mail-qc0-f175.google.com with SMTP id x13so2821158qcv.20 for ; Thu, 06 Feb 2014 03:34:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=ejxKsMm5Xutkhke9rmEDvp9AiEj5OVX4fKLDB6PeT04=; b=ES1MDgPtuR7HFBspLVoSJrCh/ZwhLsUdjv3tZVKN2vYTMlKHxgcmDymgSyGB/JsGjN aBep2tV9xewb/n+rDJUOfpSc1ysoxXGwkLn8RQPL8mhW0Z6ZJYLZyZdTB9R3Xdc++m6l OV5kPy0C5W2ccW3XNrrEdCchPMw6ukGXS+quyuc0TGc7+Zn9Mm7mZEm4pA9auVqyiVhv ubJDDvaaRN/BSvG8/GrgmkmpLOtxMevOihsg4zO9W0LxJNWqdjGVzBBPYjh0VQ2ZpZ9V laIrhMq0EWJoABimPEuoRCc57CqyYdUlV2Engp4YEul4vc50iCAv1Sf7wZdKaO6vE49D Itiw== X-Gm-Message-State: ALoCoQmHmcYwU7LitibFO1F2LACsdVcwY6ey8Cm9P9yRvuGKUsvBdon2a59eWOOLxbK271viIAPH MIME-Version: 1.0 X-Received: by 10.224.28.72 with SMTP id l8mr11702389qac.6.1391686450676; Thu, 06 Feb 2014 03:34:10 -0800 (PST) Received: by 10.140.50.178 with HTTP; Thu, 6 Feb 2014 03:34:10 -0800 (PST) In-Reply-To: References: Date: Thu, 6 Feb 2014 12:34:10 +0100 Message-ID: Subject: Re: GC taking a long time From: "Laing, Michael" To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c1c9a6881df204f1bb4091 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1c9a6881df204f1bb4091 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable for the restart issue see CASSANDRA-6008 and 6086 On Thu, Feb 6, 2014 at 12:19 PM, Alain RODRIGUEZ wrote= : > Hi Robert, > > The heap, and GC are things a bit tricky to tune, > > I recently read a post about heap, explaining how heap works and how to > tune it : > http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-= heavy-workloads > > There is plenty of this kind of blogs or articles on the web. > > Be careful, tuning highly depends on your workload and hardware. You > shouldn't use configuration found on these posts as is. You need to test, > incrementally and monitor how it behave. > > If you are not able to find a solution, there is also a lot of > professionals, consultants (like Datastax or Aaron Morton) whose job is t= o > help with Cassandra integrations, including heap and GC tuning. > > Hope this will help somehow. > > > 2014-01-29 16:51 GMT+01:00 Robert Wille : > > Forget about what I said about there not being any load during the >> night. I forgot about my unit tests. They would have been running at thi= s >> time and they run against this cluster. >> >> I also forgot to provide JVM information: >> >> java version "1.7.0_17" >> Java(TM) SE Runtime Environment (build 1.7.0_17-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) >> >> Thanks >> >> Robert >> >> From: Robert Wille >> Reply-To: >> Date: Wednesday, January 29, 2014 at 4:06 AM >> To: "user@cassandra.apache.org" >> Subject: GC taking a long time >> >> I read through the recent thread "Cassandra mad GC", which seemed very >> similar to my situation, but didn=E2=80=99t really help. >> >> Here is what I get from my logs when I grep for GCInspector. Note that >> this is the middle of the night on a dev server, so there should have be= en >> almost no load. >> >> INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456use= d; max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 us= ed; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176 >> used; max is 8126464000 >> >> There are various long GC=E2=80=99s sprinkled throughout my logs. Someti= mes it >> recovers from them. Sometime it goes into a death spiral. >> >> About a minute prior to the big meltdown at 02:51:19 there was a lot of >> activity on the local CF. About every 10 milliseconds the following is >> repeated: >> >> INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java >> (line 740) Enqueuing flush of Memtable-local@1059338449(73/3672 >> serialized/live bytes, 2 ops) >> INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333) >> Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops) >> INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373) >> Completed flushing >> /var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93 >> bytes) for commitlog position ReplayPosition(segmentId=3D1390945513179, >> position=3D25932849) >> >> It seems strange to be repeatedly flushing a table at that frequency. >> Other than that, I don=E2=80=99t see anything odd in the logs just befor= e GC went >> off the rails. >> >> I have not done any tuning. All my JVM settings are default. For the >> record, here=E2=80=99s what I get from ps: >> >> java -ea >> -javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar >> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms7830M -Xmx783= 0M >> -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k >> -XX:StringTableSize=3D1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=3D8 >> -XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction=3D75 >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark =E2= =80=A6 >> >> I have a half-dozen keyspaces with a half-dozen tables per keyspace, and >> a total of several gigabytes of data. Jut a one-node cluster. Nothing >> really remarkable. Any tables with records that could get really wide ha= ve >> been sharded. No record should be bigger than a few MB. >> >> On an unrelated note, I had to kill my server because it wouldn=E2=80=99= t do >> anything but GC. Now I cannot start it. It errors out with the following= : >> >> java.lang.IllegalStateException: Unfinished compactions reference missin= g >> sstables. This should never happen since compactions are marked finished >> before we start removing the old sstables. >> at >> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeft= overs(ColumnFamilyStore.java:495) >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:= 264) >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.ja= va:461) >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:5= 04) >> java.lang.IllegalStateException: Unfinished compactions reference missin= g >> sstables. This should never happen since compactions are marked finished >> before we start removing the old sstables. >> at >> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeft= overs(ColumnFamilyStore.java:495) >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:= 264) >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.ja= va:461) >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:5= 04) >> Exception encountered during startup: Unfinished compactions reference >> missing sstables. This should never happen since compactions are marked >> finished before we start removing the old sstables. >> >> I have a single node in my cluster. Is there any way to recover, or is m= y >> data toast? I can restore my data, so this isn=E2=80=99t a primary issue= for me. >> The GC issue, on the other hand, is preventing me from moving to product= ion. >> >> Robert >> >> > --001a11c1c9a6881df204f1bb4091 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
for the restart issue see=C2=A0CASSANDRA-6008=C2=A0and 6086


On Thu, Feb 6= , 2014 at 12:19 PM, Alain RODRIGUEZ <arodrime@gmail.com> wr= ote:
Hi Robert,
<= br>
The heap, and GC are things a bit tricky to tune,
<= br>
I recently read a post about heap, explaining how heap works and= how to tune it : http://tech.s= hift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads=

There is plenty of this kind of blogs or articles on th= e web.

Be careful, tuning highly depends on your w= orkload and hardware. You shouldn't use configuration found on these po= sts as is. You need to test, incrementally and monitor how it behave.

If you are not able to find a solution, there is also a= lot of professionals, consultants (like Datastax or Aaron Morton) whose jo= b is to help with Cassandra integrations, including heap and GC tuning.

Hope this will help somehow.


2014-01-29 16:51 GMT+01:00 Robert = Wille <rwille@fold3.com>:

Forget about what I said about there not being any load during the nig= ht. I forgot about my unit tests. They would have been running at this time= and they run against this cluster.=C2=A0

I also f= orgot to provide JVM information:

java version "1.7.0_17"
Java(T= M) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 6= 4-Bit Server VM (build 23.7-b01, mixed mode)

Thanks

Robert

From: Robert Wille <rwille@fold3.com>
Reply-To: <user@cassandra.apache.org> Date: Wednesday, January 29, 2014 = at 4:06 AM
To: "user@cassandra.apache.= org" <user@cassandra.apache.org>
Subject: GC taking a long time
=

I read through the rec= ent thread "Cassandra mad GC", which seemed very similar to my si= tuation, but didn=E2=80=99t really help.

Here is what I get from my logs when I grep for GCInspe= ctor. Note that this is the middle of the night on a dev server, so there s= hould have been almost no load.

=C2=A0INFO [Schedu= ledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116) GC for Conc= urrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max is 8126464000=
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470= 288 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 758248010= 4 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 758474387= 2 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 758851426= 4 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 759022363= 2 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 759282672= 0 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 759546452= 0 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 759710967= 2 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 759990929= 6 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 760158404= 8 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456= used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 761066737= 6 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 765034508= 8 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 765301659= 2 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:48:23,786 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 298 = ms for 1 collections, 7716831032 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 774517861= 6 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:48:48,434 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 319 = ms for 1 collections, 7796207088 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 782137868= 0 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:49:13,344 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 338 = ms for 1 collections, 7859905288 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 791114568= 8 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:49:38,473 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 359 = ms for 1 collections, 7938204144 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 798808840= 8 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:50:03,345 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 379 = ms for 1 collections, 8014757368 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 804407458= 4 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:50:28,484 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 397 = ms for 1 collections, 8094146216 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 811867505= 6 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-2= 9 02:50:53,628 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 408 = ms for 1 collections, 8121201008 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748= 352 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01= -29 02:59:38,271 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 20= 970 ms for 1 collections, 7379533664 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103= 248 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01= -29 03:00:28,777 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 20= 278 ms for 1 collections, 7380769872 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java= (line 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846= 344 used; max is 8126464000
=C2=A0INFO [ScheduledTasks:1] 2014-01= -29 03:01:23,741 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 26= 223 ms for 1 collections, 7387539176 used; max is 8126464000

There are various long GC=E2=80=99s sprinkled thr= oughout my logs. Sometimes it recovers from them. Sometime it goes into a d= eath spiral.

About a minute prior to the big meltd= own at 02:51:19 there was a lot of activity on the local CF. About every 10= milliseconds the following is repeated:

=C2=A0INFO [MutationStage:552] 2014-01-29 02:50:35= ,489 ColumnFamilyStore.java (line 740) Enqueuing flush of Memtable-local@10= 59338449(73/3672 serialized/live bytes, 2 ops)
=C2=A0INFO [FlushW= riter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333) Writing Memtable= -local@1059338449(73/3672 serialized/live bytes, 2 ops)
=C2=A0INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (lin= e 373) Completed flushing /var/lib/cassandra/data/system/local/system-local= -jb-104855-Data.db (93 bytes) for commitlog position ReplayPosition(segment= Id=3D1390945513179, position=3D25932849)

It seems strange to be repeatedly flushing a tabl= e at that frequency. Other than that, I don=E2=80=99t see anything odd in t= he logs just before GC went off the rails.

I have = not done any tuning. All my JVM settings are default. For the record, here= =E2=80=99s what I get from ps:

java -ea -javaagent:/usr/local/apache-cassandra-2.0.4/b= in/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy= =3D42 -Xms7830M -Xmx7830M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k= -XX:StringTableSize=3D1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX= :+CMSParallelRemarkEnabled -XX:SurvivorRatio=3D8 -XX:MaxTenuringThreshold= =3D1 -XX:CMSInitiatingOccupancyFraction=3D75 -XX:+UseCMSInitiatingOccupancy= Only -XX:+UseTLAB -XX:+UseCondCardMark =E2=80=A6

I have a half-dozen keyspaces with a half-dozen tables = per keyspace, and a total of several gigabytes of data. Jut a one-node clus= ter. Nothing really remarkable. Any tables with records that could get real= ly wide have been sharded. No record should be bigger than a few MB.=C2=A0<= /div>

On an unrelated note, I had to kill my server because i= t wouldn=E2=80=99t do anything but GC. Now I cannot start it. It errors out= with the following:

java.lang.IllegalStateEx= ception: Unfinished compactions reference missing sstables. This should nev= er happen since compactions are marked finished before we start removing th= e old sstables.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.db.ColumnFamilySto= re.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.service.CassandraDaem= on.setup(CassandraDaemon.java:264)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.service.CassandraD= aemon.activate(CassandraDaemon.java:461)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.ja= va:504)
java.lang.IllegalStateException: Unfinished compactions r= eference missing sstables. This should never happen since compactions are m= arked finished before we start removing the old sstables.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.db.ColumnFamilySto= re.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.service.CassandraDaem= on.setup(CassandraDaemon.java:264)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.service.CassandraD= aemon.activate(CassandraDaemon.java:461)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.ja= va:504)
Exception encountered during startup: Unfinished compacti= ons reference missing sstables. This should never happen since compactions = are marked finished before we start removing the old sstables.

I have a single node in my cluster. Is there any = way to recover, or is my data toast? I can restore my data, so this isn=E2= =80=99t a primary issue for me. The GC issue, on the other hand, is prevent= ing me from moving to production.

Robert



--001a11c1c9a6881df204f1bb4091--