Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B80A6FF6D for ; Thu, 4 Apr 2013 01:41:55 +0000 (UTC) Received: (qmail 97596 invoked by uid 500); 4 Apr 2013 01:41:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97572 invoked by uid 500); 4 Apr 2013 01:41:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97564 invoked by uid 99); 4 Apr 2013 01:41:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 01:41:52 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a42.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 01:41:48 +0000 Received: from homiemail-a42.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTP id 337B868C05D for ; Wed, 3 Apr 2013 18:41:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=xfRAzHCLTClfZSmNvg1wait19B E=; b=xnm7CdQAsB+lKG2TFVAnZD0557ZRzYXWOEKNCSDKlGP6CpB5fJmWB89Yac S+h3Ry9L4e5bwPOlRI7UxrGJiP22X50R9/28jg4vUQZTx5zqwwWjF10P9KNdiUr/ wW1AVvAc4H29RBjhwY+DoFpR/j1E30VmFOSuxVvG9QgS1wxoI= Received: from [172.20.2.191] (unknown [115.112.62.228]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTPSA id 5C93068C05B for ; Wed, 3 Apr 2013 18:41:25 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_9A587C6F-7ED0-4AB2-9882-F7262789C361" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Cassandra freezes Date: Thu, 4 Apr 2013 07:11:21 +0530 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_9A587C6F-7ED0-4AB2-9882-F7262789C361 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > INFO [ScheduledTasks:1] 2013-04-03 08:47:40,757 GCInspector.java (line = 122) GC for ParNew: 40370 ms for 3 collections, 565045688 used; max is = 2038693888 This is the JVM pausing the application for 40 seconds to complete GC. You have two choices, use a bigger heap (4Gb to 8GB) or have a lower = workload.=20 cheers =20 ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 3/04/2013, at 7:54 PM, Joel Samuelsson = wrote: > Thanks for your suggestions. I'll get back to you with the tests you = suggested, probably tomorrow. In the meantime though I have a few = questions. You say: > - 2GB of JVM heap to be insufficient to run this workload against = Cassandra >=20 > I realise that the one node cluster has a maximum workload. It did = however work fine for a few tests and then performance deteriorated. = Currently I can't even complete a test run since the server won't = respond in time - even though I haven't run a test since yesterday. = Shouldn't the server "recover" sooner or later and perform as well as it = did during the first few tests? If not automatically, what can I do to = help it? Tried nodetool flush but with no performance improvement. >=20 > And just fyi in case it changes anything, I don't immediately read = back the written rows. There are 100 000 rows being written and 100 000 = rows being read in parallell. The rows being read were written to the = cluster before the tests were run. >=20 >=20 > 2013/4/3 Andras Szerdahelyi > Wrt/ cfhistograms output, you are supposed to consider "Offset" x = "column values of each column" a separate histogram. Also AFAIK, these = counters are flushed after each invocation, so you are always looking at = data from between two invocations of cfhistograms - With that in mind, = to me your cfhistograms say : > - you encountered 200k rows with a single column >=20 > - most of your write latencies are agreeable but =96 and I can not = comment on how much a commit log write ( an append ) would affect this = as I have durable_writes:false on all my data - that=92s a long tail you = have there, in to the hundreds of milliseconds which can not be OK. > Depending on how often your memtables are switched ( emptied and = queued for flushing to disk ) and how valuable your updates received in = between two of these are, you may want to disable durable writes on the = KS with "durable_writes=3Dfalse", or the very least place the commit log = folder on its own drive. Again, I'm not absolutely sure this could = affect write latency=20 >=20 > - 38162520 reads served with a single sstable read, that=92s great >=20 > - a big chunk of these reads are served from page cache or memtables ( = the latter being more likely since, as I understand , you immediately = read back the written column and you work with unique row keys ) , but = again you have a long drop off >=20 > 16410 mutations / sec, with 1k payload, lets say that is 20MB/s in to = memory with overhead, 3rd of the 2GB heap for memtables =3D 666MB : a = switch every ~30 seconds. > I'm not sure if your write performance can be attributed to GC only, = so can you sit through one of these tests with : >=20 > watch -n2 iostat =96xmd devicename ( look for avg=96qusz and what your = peak write throughput is ) > watch -n2 nodetool tpstats ( see if you have flushes blocked. ) >=20 > In the end I think you'll either find=20 > - 2GB of JVM heap to be insufficient to run this workload against = Cassandra > - or your single disk serving your data directory being unable to keep = up with having to flush 20MB/s sustained write every 30 seconds ( = unlikely unless you are on EC2 EBS drives ) >=20 > Also, just be sure: restart cassandra before the test and confirm your = benchmark application is doing what you think its doing in terms of = reads/writes with nodetool cfstats >=20 > Regards, > Andras >=20 > From: Joel Samuelsson > Reply-To: "user@cassandra.apache.org" > Date: Wednesday 3 April 2013 11:55 >=20 > To: "user@cassandra.apache.org" > Subject: Re: Cassandra freezes >=20 > It seems this problem is back and I am unsure how to solve it. I have = a test setup like this: > 4 machines run 8 processess each. Each process has 2 threads, 1 for = writing 100 000 rows and one for reading another 100 000 rows. Each = machine (and process) read and write the exact same rows so it is = essentially the same 200 000 rows being read / written. > The cassandra cluster is a one node cluster. > The first 10-20 runs of the test described above goes smoothly, after = that tests take increasingly long time with GC happening almost all the = time. >=20 > Here is my CASS-FREEZE-001 form answers: >=20 > How big is your JVM heap ?=20 > 2GB >=20 > How many CPUs ? > A virtual environment so I can't be perfectly sure but according to = their specification, "8 cores". >=20 > Garbage collection taking long ? ( look for log lines from = GCInspector) > Yes, these are a few lines seen during 1 test run: > INFO [ScheduledTasks:1] 2013-04-03 08:47:40,757 GCInspector.java (line = 122) GC for ParNew: 40370 ms for 3 collections, 565045688 used; max is = 2038693888 > INFO [ScheduledTasks:1] 2013-04-03 08:48:24,720 GCInspector.java (line = 122) GC for ParNew: 39840 ms for 2 collections, 614065528 used; max is = 2038693888 > INFO [ScheduledTasks:1] 2013-04-03 08:49:09,319 GCInspector.java (line = 122) GC for ParNew: 37666 ms for 2 collections, 682352952 used; max is = 2038693888 > INFO [ScheduledTasks:1] 2013-04-03 08:50:02,577 GCInspector.java (line = 122) GC for ParNew: 44590 ms for 1 collections, 792861352 used; max is = 2038693888 >=20 >=20 > Running out of heap ? ( "heap is .. full" log lines ) > Yes. Same run as above: > WARN [ScheduledTasks:1] 2013-04-03 08:54:35,108 GCInspector.java (line = 139) Heap is 0.8596674853032178 full. You may need to reduce memtable = and/or cache sizes. Cassandra is now reducing cache sizes to free up = memory. Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you = don't want Cassandra to do this automatically > WARN [ScheduledTasks:1] 2013-04-03 08:54:36,831 GCInspector.java (line = 145) Heap is 0.8596674853032178 full. You may need to reduce memtable = and/or cache sizes. Cassandra will now flush up to the two largest = memtables to free up memory. Adjust flush_largest_memtables_at = threshold in cassandra.yaml if you don't want Cassandra to do this = automatically >=20 >=20 > Any tasks backing up / being dropped ? ( nodetool tpstats and ".. = dropped > in last .. ms" log lines ) > Yes. Same run as above: > INFO [ScheduledTasks:1] 2013-04-03 08:52:04,943 MessagingService.java = (line 673) 31 MUTATION messages dropped in last 5000ms > INFO [ScheduledTasks:1] 2013-04-03 08:52:04,944 MessagingService.java = (line 673) 8 READ messages dropped in last 5000ms >=20 > Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily = ) > Not sure how to interpret the output of nodetool cfhistograms, but = here it is (I hope it's fairly readable): > Offset SSTables Write Latency Read Latency Row = Size Column Count > 1 38162520 0 0 = 0 200000 > 2 0 22 0 = 0 0 > 3 0 1629 0 = 0 0 > 4 0 9990 0 = 0 0 > 5 0 40169 0 = 0 0 > 6 0 161538 0 = 0 0 > 7 0 487266 0 = 0 0 > 8 0 1096601 0 = 0 0 > 10 0 4842978 0 = 0 0 > 12 0 7976003 0 = 0 0 > 14 0 8673230 0 = 0 0 > 17 0 9805730 0 = 0 0 > 20 0 5083707 0 = 0 0 > 24 0 2541157 0 = 0 0 > 29 0 768916 0 = 0 0 > 35 0 220440 0 = 0 0 > 42 0 112915 0 = 0 0 > 50 0 71469 0 = 0 0 > 60 0 48909 0 = 0 0 > 72 0 50714 0 = 0 0 > 86 0 45390 0 = 0 0 > 103 0 41975 0 = 0 0 > 124 0 40371 0 = 0 0 > 149 0 37103 0 = 0 0 > 179 0 44631 0 = 0 0 > 215 0 43957 0 = 0 0 > 258 0 32499 1 = 0 0 > 310 0 18446 23056779 = 0 0 > 372 0 13113 12580639 = 0 0 > 446 0 9862 1017347 = 0 0 > 535 0 7480 784506 = 0 0 > 642 0 5473 274274 = 0 0 > 770 0 4084 56379 = 0 0 > 924 0 3046 27979 = 0 0 > 1109 0 2205 20206 = 200000 0 > 1331 0 1658 16947 = 0 0 > 1597 0 1228 16969 = 0 0 > 1916 0 896 15848 = 0 0 > 2299 0 542 13928 = 0 0 > 2759 0 379 11782 = 0 0 > 3311 0 326 9761 = 0 0 > 3973 0 540 8997 = 0 0 > 4768 0 450 7938 = 0 0 > 5722 0 270 6552 = 0 0 > 6866 0 170 6022 = 0 0 > 8239 0 146 6474 = 0 0 > 9887 0 166 7969 = 0 0 > 11864 0 176 53725 = 0 0 > 14237 0 203 10260 = 0 0 > 17084 0 255 6827 = 0 0 > 20501 0 312 27462 = 0 0 > 24601 0 445 11523 = 0 0 > 29521 0 736 9904 = 0 0 > 35425 0 909 20539 = 0 0 > 42510 0 896 14280 = 0 0 > 51012 0 904 12443 = 0 0 > 61214 0 715 11956 = 0 0 > 73457 0 652 10040 = 0 0 > 88148 0 474 7992 = 0 0 > 105778 0 256 5043 = 0 0 > 126934 0 113 2370 = 0 0 > 152321 0 75 1189 = 0 0 > 182785 0 39 690 = 0 0 > 219342 0 44 550 = 0 0 > 263210 0 69 551 = 0 0 > 315852 0 35 419 = 0 0 > 379022 0 35 564 = 0 0 > 454826 0 52 504 = 0 0 > 545791 0 79 749 = 0 0 > 654949 0 61 737 = 0 0 > 785939 0 30 399 = 0 0 > 943127 0 57 611 = 0 0 > 1131752 0 78 488 = 0 0 > 1358102 0 23 302 = 0 0 > 1629722 0 28 240 = 0 0 > 1955666 0 48 294 = 0 0 > 2346799 0 28 306 = 0 0 > 2816159 0 19 224 = 0 0 > 3379391 0 37 212 = 0 0 > 4055269 0 24 237 = 0 0 > 4866323 0 13 137 = 0 0 > 5839588 0 11 99 = 0 0 > 7007506 0 4 115 = 0 0 > 8409007 0 16 194 = 0 0 > 10090808 0 12 156 = 0 0 > 12108970 0 12 54 = 0 0 > 14530764 0 24 147 = 0 0 > 17436917 0 10 114 = 0 0 > 20924300 0 3 66 = 0 0 > 25109160 0 22 100 = 0 0+ >=20 > Some of the write latencies looks really bad, but since they have = column count 0 for most, I am not sure what to make of it. >=20 >=20 >=20 > How much is lots of data?=20 > Lots of data might have been an exaggeration but the test is as = described above. Each row read or written is about 1kb in size so each = test run generates 4 (machines) * 8 (processes) * 2 (read and write) * = 100 000 (rows) * 1kb (row size) =3D 6250 mb read or written (half of = each) >=20 > Wide or skinny rows?=20 > Skinny rows, only a single column is used for each row. >=20 > Mutations/sec ? > The test when run on a freshly rebooted cassandra takes around 390 = seconds, and 6400000 rows are read / written during this time period so = around 16410 mutations / second (unless I missunderstood the word = mutation). >=20 > Which Compaction Strategy are you using?=20 > SizeTieredCompactionStrategy >=20 > Output of show schema ( > cassandra-cli ) for the relevant Keyspace/CF might help as well > create column family rawData > with column_type =3D 'Standard' > and comparator =3D 'UTF8Type' > and default_validation_class =3D 'BytesType' > and key_validation_class =3D 'BytesType' > and read_repair_chance =3D 0.1 > and dclocal_read_repair_chance =3D 0.0 > and gc_grace =3D 864000 > and min_compaction_threshold =3D 4 > and max_compaction_threshold =3D 32 > and replicate_on_write =3D true > and compaction_strategy =3D = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' > and caching =3D 'KEYS_ONLY' > and column_metadata =3D [ > {column_name : 'created', > validation_class : LongType}, > {column_name : 'socketHash', > validation_class : Int32Type}, > {column_name : 'data', > validation_class : UTF8Type}, > {column_name : 'guid', > validation_class : UTF8Type}, > {column_name : 'evaluated', > validation_class : Int32Type, > index_name : 'rawData_evaluated_idx_1', > index_type : 0}] > and compression_options =3D {'sstable_compression' : = 'org.apache.cassandra.io.compress.SnappyCompressor'}; >=20 > Only the "data" column is used during the test. >=20 >=20 >=20 > What consistency are you doing your writes with ? > I am writing with consistency level ONE. >=20 >=20 > What are the values for these settings in cassandra.yaml >=20 > memtable_total_space_in_mb: No value set in cassandra.yaml, so 1/3 of = heap according to documentation (2gb / 3) > memtable_flush_writers: No value set in cassandra.yaml, but only one = data directory so I assume it is 1. > memtable_flush_queue_size: 4 > compaction_throughput_mb_per_sec: 16 > concurrent_writes: 32 >=20 >=20 > Which version of Cassandra? > 1.1.8 >=20 >=20 > Hope this helps you help me :) > Best regards, > Joel Samuelsson >=20 >=20 > 2013/3/22 Joel Samuelsson > Thanks for the GC suggestion. It seems we didn't have enough CPU power = to handle both the data and GC. Increasing the number of CPU cores made = everything run smoothly at the same load. >=20 >=20 > 2013/3/21 Andras Szerdahelyi > Neat! > Thanks. >=20 > From: Sylvain Lebresne > Reply-To: "user@cassandra.apache.org" > Date: Thursday 21 March 2013 10:10 > To: "user@cassandra.apache.org" > Subject: Re: Cassandra freezes >=20 > Prior to 1.2 the index summaries were not saved on disk, and were thus = computed on startup while the sstable was loaded. In 1.2 they now are = saved on disk to make startup faster = (https://issues.apache.org/jira/browse/CASSANDRA-2392). That being said, = if the index_interval value used by a summary saved doesn't match the = current one while the sstable is loaded, the summary is recomputed = anyway, so restarting a node should always take a new index_interval = setting into account. >=20 > -- > Sylvain >=20 >=20 > On Thu, Mar 21, 2013 at 9:43 AM, Andras Szerdahelyi = wrote: > I can not find the reference that notes having to upgradesstables when = you > change this. I really hope such complex assumptions are not = formulating in > my head just on their own and there actually exists some kind of = reliable > reference that clears this up :-) but, >=20 > # index_interval controls the sampling of entries from the primrary > # row index in terms of space versus time. The larger the interval, > # the smaller and less effective the sampling will be. In technicial > # terms, the interval coresponds to the number of index entries that > # are skipped between taking each sample. All the sampled entries > # must fit in memory. Generally, a value between 128 and 512 here > # coupled with a large key cache size on CFs results in the best trade > # offs. This value is not often changed, however if you have many > # very small rows (many to an OS page), then increasing this will > # often lower memory usage without a impact on performance. >=20 > it is ( very ) safe to assume the row index is re-built/updated when = new > sstables are built. > Obviously the sample of this index will have to follow this process = very > closely. >=20 > It is possible however that the sample itself is not persisted and is > built at startup as opposed to *only* when the index changes.( which = is > what I thought was happening ) > It shouldn't be too difficult to verify this, but I'd appreciate if > someone who looked at this before could confirm if this is the case. >=20 > Thanks, > Andras >=20 > On 21/03/13 09:13, "Michal Michalski" wrote: >=20 > >About index_interval: > > > >> 1) you have to rebuild stables ( not an issue if you are = evaluating, > >>doing > >> test writes.. Etc, not so much in production ) > > > >Are you sure of this? As I understand indexes, it's not required = because > >this parameter defines an interval of in-memory index sample, which = is > >created during C* startup basing on a primary on-disk index file. The > >fact that Heap usage is reduced immediately after C* restart seem to > >confirm this, but maybe I miss something? > > > >M. >=20 >=20 >=20 >=20 >=20 --Apple-Mail=_9A587C6F-7ED0-4AB2-9882-F7262789C361 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 INFO [ScheduledTasks:1] 2013-04-03 = 08:47:40,757 GCInspector.java (line 122) GC for ParNew: 40370 ms for 3 = collections, 565045688 used; max is 2038693888
This is the = JVM pausing the application for 40 seconds to complete = GC.

You have two choices, use a bigger heap = (4Gb to 8GB) or have a lower = workload. 

 cheers

=
 
http://www.thelastpickle.com

On 3/04/2013, at 7:54 PM, Joel Samuelsson <samuelsson.joel@gmail.com>= ; wrote:

Thanks for your suggestions. I'll get = back to you with the tests you suggested, probably tomorrow. In the = meantime though I have a few questions. You say:
2GB= of JVM heap to be insufficient to run this workload against = Cassandra
I = realise that the one node cluster has a maximum workload. It did however = work fine for a few tests and then performance deteriorated. Currently I can't even = complete a test run since the server won't respond in time - even though = I haven't run a test since yesterday. Shouldn't the server "recover" = sooner or later and perform as well as it did during the first few = tests? If not automatically, what can I do to help it? Tried nodetool = flush but with no performance improvement.

And just fyi in case it = changes anything, I don't immediately read back the written rows. There = are 100 000 rows being written and 100 000 rows being read in parallell. = The rows being read were written to the cluster before the tests were = run.


2013/4/3 Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
Wrt/ cfhistograms output, you are supposed to consider "Offset" x = "column values of each column" a separate histogram. Also AFAIK, these = counters are flushed after each invocation, so you are always looking at = data from between two invocations of cfhistograms   - With that in mind, to me your cfhistograms say :
- you encountered 200k rows with a single column

- most of your write latencies are agreeable but =96 and I can not = comment on how much a commit log write ( an append ) would affect this = as I have durable_writes:false on all my data - that=92s a long tail you = have there, in to the hundreds of milliseconds which can not be OK.
Depending on how often your memtables are switched ( emptied and queued = for flushing to disk ) and how valuable your updates received in between = two of these are, you may want to disable durable writes on the KS with = "durable_writes=3Dfalse", or the very least place the commit log folder on its own drive. Again, I'm not absolutely = sure this could affect write latency 

- 38162520 reads served with = a single sstable read, that=92s great

- a big chunk of these reads are = served from page cache or memtables ( the latter being more likely = since, as I understand , you immediately read back the written = column and you work with unique row keys ) , but again you have a long drop off

16410 mutations / sec, with 1k payload, lets say that is 20MB/s in to = memory with overhead, 3rd of the 2GB heap for memtables =3D 666MB : a = switch every ~30 seconds.
I'm not = sure if your write performance can be attributed to GC only, so can you = sit through one of these tests with :

watch -n2 iostat =96xmd devicename ( look for avg=96qusz and what your = peak write throughput is )
watch -n2 nodetool tpstats ( see = if you have flushes blocked. )

In the end I think you'll either find 
- 2GB of JVM heap to be insufficient to run this workload against = Cassandra
- or your single disk serving your data directory being unable to keep = up with having to flush 20MB/s sustained write every 30 seconds ( = unlikely unless you are on EC2 EBS drives )

Also, just be sure: restart cassandra before the test and confirm your = benchmark application is doing what you think its doing in terms of = reads/writes with nodetool cfstats

Regards,
Andras

Date: Wednesday 3 April = 2013 11:55

To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cassandra = freezes

It seems this problem is back and I am unsure how to = solve it. I have a test setup like this:
4 machines run 8 processess each. Each process has 2 threads, 1 for = writing 100 000 rows and one for reading another 100 000 rows. Each = machine (and process) read and write the exact same rows so it is = essentially the same 200 000 rows being read / written.
The cassandra cluster is a one node cluster.
The first 10-20 runs of the test described above goes smoothly, = after that tests take increasingly long time with GC happening almost = all the time.

Here is my CASS= -FREEZE-001 form answers:

How = big is your JVM heap ? 
2GB<= /span>

=
How = many CPUs ?
A virtual environment so I can't be perfectly sure but according to = their specification, "8 cores".

Garb= age collection taking long ? ( look for log lines from = GCInspector)
Yes, these are a few lines seen during 1 test run:
INFO [ScheduledTasks:1] 2013-04-03 08:47:40,757 GCInspector.java = (line 122) GC for ParNew: 40370 ms for 3 collections, 565045688 used; = max is 2038693888
INFO [ScheduledTasks:1] 2013-04-03 = 08:48:24,720 GCInspector.java (line 122) GC for ParNew: 39840 ms for 2 = collections, 614065528 used; max is 2038693888
INFO [ScheduledTasks:1] 2013-04-03 = 08:49:09,319 GCInspector.java (line 122) GC for ParNew: 37666 ms for 2 = collections, 682352952 used; max is 2038693888
INFO [ScheduledTasks:1] 2013-04-03 = 08:50:02,577 GCInspector.java (line 122) GC for ParNew: 44590 ms for 1 = collections, 792861352 used; max is 2038693888


Runn= ing out of heap ? ( "heap is .. full" log lines )
Yes. Same run as above:
WARN [ScheduledTasks:1] 2013-04-03 08:54:35,108 GCInspector.java = (line 139) Heap is 0.8596674853032178 full.  You may need to reduce = memtable and/or cache sizes.  Cassandra is now reducing cache sizes = to free up memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this = automatically
WARN [ScheduledTasks:1] 2013-04-03 08:54:36,831 GCInspector.java = (line 145) Heap is 0.8596674853032178 full.  You may need to reduce = memtable and/or cache sizes.  Cassandra will now flush up to the = two largest memtables to free up memory.  Adjust = flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this = automatically


Any = tasks backing up / being dropped ? ( nodetool tpstats and ".. = dropped
in = last .. ms" log lines )
Yes. Same run as above:
INFO [ScheduledTasks:1] 2013-04-03 08:52:04,943 = MessagingService.java (line 673) 31 MUTATION messages dropped in last = 5000ms
INFO [ScheduledTasks:1] 2013-04-03 08:52:04,944 = MessagingService.java (line 673) 8 READ messages dropped in last = 5000ms

Are = writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily = )
Not sure how to interpret the output of nodetool cfhistograms, but here = it is (I hope it's fairly readable):
Offset      SSTables     Write Latency =      Read Latency          Row = Size      Column Count
1           38162520       =           0           =       0               =   0            200000
2                  0 =                22     =             0         =         0             =     0
3                  0 =              1629     =             0         =         0             =     0
4                  0 =              9990     =             0         =         0             =     0
5                  0 =             40169       =           0           =       0               =   0
6                  0 =            161538       =           0           =       0               =   0
7                  0 =            487266       =           0           =       0               =   0
8                  0 =           1096601         =         0             =     0                 = 0
10                 0   =         4842978           =       0               =   0                 0
12                 0   =         7976003           =       0               =   0                 0
14                 0   =         8673230           =       0               =   0                 0
17                 0   =         9805730           =       0               =   0                 0
20                 0   =         5083707           =       0               =   0                 0
24                 0   =         2541157           =       0               =   0                 0
29                 0   =          768916         =         0             =     0                 = 0
35                 0   =          220440         =         0             =     0                 = 0
42                 0   =          112915         =         0             =     0                 = 0
50                 0   =           71469         =         0             =     0                 = 0
60                 0   =           48909         =         0             =     0                 = 0
72                 0   =           50714         =         0             =     0                 = 0
86                 0   =           45390         =         0             =     0                 = 0
103                0   =           41975         =         0             =     0                 = 0
124                0   =           40371         =         0             =     0                 = 0
149                0   =           37103         =         0             =     0                 = 0
179                0   =           44631         =         0             =     0                 = 0
215                0   =           43957         =         0             =     0                 = 0
258                0   =           32499         =         1             =     0                 = 0
310                0   =           18446         =  23056779                 0 =                 0
372                0   =           13113         =  12580639                 0 =                 0
446                0   =            9862       =     1017347               =   0                 0
535                0   =            7480       =      784506             =     0                 = 0
642                0   =            5473       =      274274             =     0                 = 0
770                0   =            4084       =       56379             =     0                 = 0
924                0   =            3046       =       27979             =     0                 = 0
1109               0   =            2205       =       20206           =  200000                 = 0
1331               0   =            1658       =       16947             =     0                 = 0
1597               0   =            1228       =       16969             =     0                 = 0
1916               0   =             896       =       15848             =     0                 = 0
2299               0   =             542       =       13928             =     0                 = 0
2759               0   =             379       =       11782             =     0                 = 0
3311               0   =             326       =        9761           =       0               =   0
3973               0   =             540       =        8997           =       0               =   0
4768               0   =             450       =        7938           =       0               =   0
5722               0   =             270       =        6552           =       0               =   0
6866               0   =             170       =        6022           =       0               =   0
8239               0   =             146       =        6474           =       0               =   0
9887               0   =             166       =        7969           =       0               =   0
11864              0   =             176       =       53725             =     0                 = 0
14237              0   =             203       =       10260             =     0                 = 0
17084              0   =             255       =        6827           =       0               =   0
20501              0   =             312       =       27462             =     0                 = 0
24601              0   =             445       =       11523             =     0                 = 0
29521              0   =             736       =        9904           =       0               =   0
35425              0   =             909       =       20539             =     0                 = 0
42510              0   =             896       =       14280             =     0                 = 0
51012              0   =             904       =       12443             =     0                 = 0
61214              0   =             715       =       11956             =     0                 = 0
73457              0   =             652       =       10040             =     0                 = 0
88148              0   =             474       =        7992           =       0               =   0
105778             0     =           256         =      5043             =     0                 = 0
126934             0     =           113         =      2370             =     0                 = 0
152321             0     =            75         =      1189             =     0                 = 0
182785             0     =            39         =       690             =     0                 = 0
219342             0     =            44         =       550             =     0                 = 0
263210             0     =            69         =       551             =     0                 = 0
315852             0     =            35         =       419             =     0                 = 0
379022             0     =            35         =       564             =     0                 = 0
454826             0     =            52         =       504             =     0                 = 0
545791             0     =            79         =       749             =     0                 = 0
654949             0     =            61         =       737             =     0                 = 0
785939             0     =            30         =       399             =     0                 = 0
943127             0     =            57         =       611             =     0                 = 0
1131752            0     =            78         =       488             =     0                 = 0
1358102            0     =            23         =       302             =     0                 = 0
1629722            0     =            28         =       240             =     0                 = 0
1955666            0     =            48         =       294             =     0                 = 0
2346799            0     =            28         =       306             =     0                 = 0
2816159            0     =            19         =       224             =     0                 = 0
3379391            0     =            37         =       212             =     0                 = 0
4055269            0     =            24         =       237             =     0                 = 0
4866323            0     =            13         =       137             =     0                 = 0
5839588            0     =            11         =        99             =     0                 = 0
7007506            0     =             4         =       115             =     0                 = 0
8409007            0     =            16         =       194             =     0                 = 0
10090808           0       =          12           =     156               =   0                 0
12108970           0       =          12           =      54               =   0                 0
14530764           0       =          24           =     147               =   0                 0
17436917           0       =          10           =     114               =   0                 0
20924300           0       =           3           =      66               =   0                 0
25109160           0       =          22           =     100               =   0                 = 0+

Some of the write latencies looks really bad, but since they have = column count 0 for most, I am not sure what to make of it.



How = much is lots of data? 
Lots= of data might have been an exaggeration but the test is as described = above. Each row read or written is about 1kb in size so each test run generates 4 (machines) * 8 (processes) * 2 (read and write) * = 100 000 (rows) * 1kb (row size) =3D 6250 mb read or written (half of = each)

=
Wide= or skinny rows? 
Skin= ny rows, only a single column is used for each row.

=
Muta= tions/sec ?
The test when run on a freshly rebooted cassandra takes = around 390 seconds, and 6400000 rows are read / written during = this time period so around 16410 mutations / second (unless I = missunderstood the word mutation).

Whic= h Compaction Strategy are you using? 
SizeTieredCompactionStrategy

=
Outp= ut of show schema (
cass= andra-cli ) for the relevant Keyspace/CF might help as well
create column family rawData
  with column_type =3D 'Standard'
  and comparator =3D 'UTF8Type'
  and default_validation_class =3D 'BytesType'
  and key_validation_class =3D 'BytesType'
  and read_repair_chance =3D 0.1
  and dclocal_read_repair_chance =3D 0.0
  and gc_grace =3D 864000
  and min_compaction_threshold =3D 4
  and max_compaction_threshold =3D 32
  and replicate_on_write =3D true
  and compaction_strategy =3D = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching =3D 'KEYS_ONLY'
  and column_metadata =3D [
    {column_name : 'created',
    validation_class : LongType},
    {column_name : 'socketHash',
    validation_class : Int32Type},
    {column_name : 'data',
    validation_class : UTF8Type},
    {column_name : 'guid',
    validation_class : UTF8Type},
    {column_name : 'evaluated',
    validation_class : Int32Type,
    index_name : 'rawData_evaluated_idx_1',
    index_type : 0}]
  and compression_options =3D {'sstable_compression' : = 'org.apache.cassandra.io.compress.SnappyCompressor'};

Only the "data" column is used during the test.



= What= consistency are you doing your writes with ?
I am writing with consistency level ONE.


= What= are the values for these settings in cassandra.yaml

= memt= able_total_space_in_mb: No value set in cassandra.yaml, so 1/3 of heap = according to documentation (2gb / 3)
memt= able_flush_writers: No value set in cassandra.yaml, but only one data = directory so I assume it is 1.
memt= able_flush_queue_size: 4
comp= action_throughput_mb_per_sec:= 16
conc= urrent_writes: 32

=
= Whic= h version of Cassandra?
1.1.8


Hope this helps you help me :)
Best regards,
Joel Samuelsson


2013/3/22 Joel Samuelsson <samuelsson.joel@gmail.com>
Thanks for the GC suggestion. It seems we didn't have = enough CPU power to handle both the data and GC. Increasing the number = of CPU cores made everything run smoothly at the same load.


2013/3/21 Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
Neat!
Thanks.

From: Sylvain Lebresne <sylvain@datastax.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday 21 March 2013 = 10:10
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cassandra = freezes

Prior to 1.2 the index summaries were not saved on = disk, and were thus computed on startup while the sstable was loaded. In = 1.2 they now are saved on disk to make startup faster (https://issues.apache.org/jira/browse/CASSANDRA-2392= ). That being said, if the index_interval value used by a summary saved = doesn't match the current one while the sstable is loaded, the summary = is recomputed anyway, so restarting a node should always take a new = index_interval setting into account.

--
Sylvain


On Thu, Mar 21, 2013 at 9:43 AM, Andras = Szerdahelyi <andras.szerdahelyi@ignitionone.com> = wrote:
I can not find the reference that notes having to upgradesstables when = you
change this. I really hope such complex assumptions are not formulating = in
my head just on their own and there actually exists some kind of = reliable
reference that clears this up :-) but,

# index_interval controls the sampling of entries from the primrary
# row index in terms of space versus time. The larger the interval,
# the smaller and less effective the sampling will be. In technicial
# terms, the interval coresponds to the number of index entries that
# are skipped between taking each sample. All the sampled entries
# must fit in memory. Generally, a value between 128 and 512 here
# coupled with a large key cache size on CFs results in the best = trade
# offs. This value is not often changed, however if you have many
# very small rows (many to an OS page), then increasing this will
# often lower memory usage without a impact on performance.

it is ( very ) safe to assume the row index is re-built/updated when = new
sstables are built.
Obviously the sample of this index will have to follow this process = very
closely.

It is possible however that the sample itself is not persisted and = is
built at startup as opposed to *only* when the index changes.( which = is
what I thought was happening )
It shouldn't be too difficult to verify this, but I'd appreciate if
someone who looked at this before could confirm if this is the case.

Thanks,
Andras

On 21/03/13 09:13, "Michal Michalski" <michalm@opera.com> wrote:

>About index_interval:
>
>> 1) you have to rebuild stables ( not an issue if you are = evaluating,
>>doing
>> test writes.. Etc, not so much in production )
>
>Are you sure of this? As I understand indexes, it's not required = because
>this parameter defines an interval of in-memory index sample, which = is
>created during C* startup basing on a primary on-disk index file. = The
>fact that Heap usage is reduced immediately after C* restart seem = to
>confirm this, but maybe I miss something?
>
>M.






= --Apple-Mail=_9A587C6F-7ED0-4AB2-9882-F7262789C361--