Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A6D59A86 for ; Mon, 5 Mar 2012 09:07:36 +0000 (UTC) Received: (qmail 32957 invoked by uid 500); 5 Mar 2012 09:07:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32933 invoked by uid 500); 5 Mar 2012 09:07:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32924 invoked by uid 99); 5 Mar 2012 09:07:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 09:07:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [94.75.197.170] (HELO mail.unitedgames.com) (94.75.197.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 09:07:26 +0000 Received: from localhost (localhost [127.0.0.1]) by mail.unitedgames.com (Postfix) with ESMTP id 6D4912D6FC for ; Mon, 5 Mar 2012 10:07:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=unitedgames.com; h=content-type:content-type:in-reply-to:references:subject :subject:to:mime-version:user-agent:from:from:date:date :message-id; s=dkim; t=1330938413; x=1331802413; bh=ZaCNkbi4KIni FgW8hC9AMmB3hstt+8fU6ArhQm1NaDw=; b=oNByMn7sM0qJIQzxSU+Ujro6unWW 5vcqtdF/4P20IPoMfu5Spy8G1UBy+0XIVkkT21Mhlc2kKLwCtpz9EMxC640ZHdu6 704O5PPyoaVTDGZxIm1orUzy1K8GqJbud54CxC66mkEqEFxr7EQK+dxGLg397BXN uukP0RzTqpRzF9M= X-Virus-Scanned: Debian amavisd-new at mail.unitedgames.com Received: from mail.unitedgames.com ([127.0.0.1]) by localhost (mail.unitedgames.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i4uTGHI4FjnM for ; Mon, 5 Mar 2012 10:06:53 +0100 (CET) Received: from [192.168.1.92] (unknown [188.204.191.147]) by mail.unitedgames.com (Postfix) with ESMTPA id C48502D6F2 for ; Mon, 5 Mar 2012 10:06:52 +0100 (CET) Message-ID: <4F548227.1030202@unitedgames.com> Date: Mon, 05 Mar 2012 10:06:47 +0100 From: Stefan Reek User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20110307 Iceowl/1.0b1 Icedove/3.0.11 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Impact of old data on performance References: <4F4CA35F.3040807@unitedgames.com> <4F4E24AE.8080702@unitedgames.com> <4F4F4BD5.3050301@unitedgames.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010307060704040705000809" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------010307060704040705000809 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for the help Dan. Is there anyone else that can shed some light on this issue? For example, what are the options for reducing the time taken by the mark/remark pauses? cheers, Stefan On 03/01/2012 06:51 PM, Dan Retzlaff wrote: > I've never had to deal with GC tuning since our cluster has relatively > few (but large) columns. So I'll leave further comment to others, but > it sounds like you're on the right track. > > On Thu, Mar 1, 2012 at 2:13 AM, Stefan Reek > wrote: > > Swap is disabled on the machines, so I'm sure the JVM is not > swapping out. > Cassandra is the only process running on those machines, so > network contention shouldn't be an issues either. > Which leaves Garbage collection. > > In the logs I can see that the Garbage Collection takes a long > time, for example: > > GCInspector.java (line 130) GC for ConcurrentMarkSweep: 384113 ms, > -84703136 reclaimed leaving 1018203472 used; max is 17208836096 > > > But I configured the Garbage Collector to use the > ConcurrentMarkSweep algorithm. As far as I understand this means > that there will only be two pauses during the compaction, the mark > and the remark stage. > > As I also have extra garbage collection logging enabled on one > node, I checked to see if the mark/remark times sometimes exceed > the 10 second RPC timeout and I found some logging statements like > this: > > 422208.422: [CMS-concurrent-mark: 0.434/20.004 secs] [Times: > user=13.40 sys=0.75, real=20.00 secs] > > Does this mean that the mark session took over 10 seconds and all > other Cassandra threads were stopped during this > time? If so, that would explain why I sometimes get timeouts on > requests and also see Dropped Messages in my logs. > > Cheers, > > Stefan > > > > On 02/29/2012 07:48 PM, Dan Retzlaff wrote: >> First to be clear, I'm not an expert but I suggested "cfstats" >> because it can show unhealthy signs. That said, yours looks okay >> to me: few live SSTables per column family, reasonable quantity >> of data... The next things I'd verify are that (1) the JVM isn't >> swapping out during these periods of bad performance, (2) GC >> completes in less than your RPC timeout (check the logs), and (3) >> your internode network connection isn't degraded by another >> application. >> >> The fact that *writes* are timing out is interesting, since >> existing data should have little to no effect on those. >> >> BTW, since ColumnFamily7 sees most of your action, you might >> consider whether a key/row cache is appropriate. By my count it >> has 2M rows and 7M reads since startup. But this fine-tuning is >> probably a distraction from whatever's causing your timeouts. >> >> Hope that helps, >> Dan >> >> On Wed, Feb 29, 2012 at 5:14 AM, Stefan Reek >> > wrote: >> >> Hi Dan, >> >> Thanks for answering. >> I included the output of cfstats below. >> I hope you can say something about our problems with it. >> >> cheers, >> >> Stefan >> >> Keyspace: Keyspace1 >> Read Count: 60703419 >> Read Latency: 1.1790332096286043 ms. >> Write Count: 105871791 >> Write Latency: 0.019847457393065166 ms. >> Pending Tasks: 0 >> Column Family: ColumnFamily1 >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 0 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily2 >> SSTable count: 2 >> Space used (live): 482828 >> Space used (total): 482828 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 3 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 9 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 0 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 258 >> Compacted row maximum size: 275 >> Compacted row mean size: 266 >> >> Column Family: ColumnFamily3 >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 0 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily4 >> SSTable count: 1 >> Space used (live): 19793722 >> Space used (total): 19793722 >> Memtable Columns Count: 306 >> Memtable Data Size: 12731 >> Memtable Switch Count: 3 >> Read Count: 2309 >> Read Latency: NaN ms. >> Write Count: 3234 >> Write Latency: 0.021 ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 465813 >> Compacted row maximum size: 4784014 >> Compacted row mean size: 3295196 >> >> Column Family: ColumnFamily5 >> SSTable count: 6 >> Space used (live): 364775507 >> Space used (total): 364775507 >> Memtable Columns Count: 14193 >> Memtable Data Size: 552090 >> Memtable Switch Count: 3 >> Read Count: 211451 >> Read Latency: NaN ms. >> Write Count: 26567 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 187 >> Compacted row maximum size: 333233 >> Compacted row mean size: 662 >> >> Column Family: ColumnFamily6 >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily7 >> SSTable count: 12 >> Space used (live): 6031951162 >> Space used (total): 6031951162 >> Memtable Columns Count: 480647 >> Memtable Data Size: 29424092 >> Memtable Switch Count: 43 >> Read Count: 7506944 >> Read Latency: 0.761 ms. >> Write Count: 3022099 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 239 >> Compacted row maximum size: 93925 >> Compacted row mean size: 2737 >> >> Column Family: ColumnFamily8 >> SSTable count: 1 >> Space used (live): 241104 >> Space used (total): 241104 >> Memtable Columns Count: 2 >> Memtable Data Size: 94 >> Memtable Switch Count: 3 >> Read Count: 685 >> Read Latency: NaN ms. >> Write Count: 143 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 254 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 254 >> Compacted row maximum size: 479 >> Compacted row mean size: 305 >> >> Column Family: ColumnFamily9 >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 0 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily10 >> SSTable count: 3 >> Space used (live): 22874666 >> Space used (total): 22874666 >> Memtable Columns Count: 11886 >> Memtable Data Size: 519166 >> Memtable Switch Count: 3 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 73970 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 161 >> Compacted row maximum size: 103903 >> Compacted row mean size: 11504 >> >> Column Family: ColumnFamily11 >> SSTable count: 3 >> Space used (live): 6533310 >> Space used (total): 6533310 >> Memtable Columns Count: 8 >> Memtable Data Size: 334 >> Memtable Switch Count: 3 >> Read Count: 5431 >> Read Latency: NaN ms. >> Write Count: 197 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 653 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 262 >> Compacted row maximum size: 17149 >> Compacted row mean size: 359 >> >> Column Family: ColumnFamily12 >> SSTable count: 1 >> Space used (live): 5504796 >> Space used (total): 5504796 >> Memtable Columns Count: 722 >> Memtable Data Size: 5495928 >> Memtable Switch Count: 3 >> Read Count: 6413 >> Read Latency: NaN ms. >> Write Count: 5784 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 167 >> Compacted row maximum size: 5498089 >> Compacted row mean size: 261974 >> >> Column Family: ColumnFamily13 >> SSTable count: 3 >> Space used (live): 18208017 >> Space used (total): 18208017 >> Memtable Columns Count: 40 >> Memtable Data Size: 1730 >> Memtable Switch Count: 3 >> Read Count: 4617 >> Read Latency: NaN ms. >> Write Count: 132 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 1604 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 512 >> Compacted row maximum size: 1776 >> Compacted row mean size: 683 >> >> Column Family: ColumnFamily14 >> SSTable count: 9 >> Space used (live): 1631088415 >> Space used (total): 1631088415 >> Memtable Columns Count: 355843 >> Memtable Data Size: 15750980 >> Memtable Switch Count: 1142 >> Read Count: 37874988 >> Read Latency: 0.830 ms. >> Write Count: 96285506 >> Write Latency: 0.032 ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 599 >> Compacted row maximum size: 1124 >> Compacted row mean size: 861 >> >> Column Family: ColumnFamily15 >> SSTable count: 2 >> Space used (live): 69991756 >> Space used (total): 69991756 >> Memtable Columns Count: 315483 >> Memtable Data Size: 14827701 >> Memtable Switch Count: 65 >> Read Count: 5653028 >> Read Latency: 6.888 ms. >> Write Count: 47666 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 160 >> Compacted row maximum size: 5683855 >> Compacted row mean size: 35031 >> >> Column Family: ColumnFamily16 >> SSTable count: 1 >> Space used (live): 44605875 >> Space used (total): 44605875 >> Memtable Columns Count: 20273 >> Memtable Data Size: 645383 >> Memtable Switch Count: 3 >> Read Count: 373155 >> Read Latency: 0.058 ms. >> Write Count: 199847 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 209 >> Compacted row maximum size: 82030 >> Compacted row mean size: 617 >> >> Column Family: ColumnFamily17 >> SSTable count: 1 >> Space used (live): 5609 >> Space used (total): 5609 >> Memtable Columns Count: 166 >> Memtable Data Size: 2075 >> Memtable Switch Count: 3 >> Read Count: 17630 >> Read Latency: NaN ms. >> Write Count: 542 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 522 >> Compacted row maximum size: 563 >> Compacted row mean size: 545 >> >> Column Family: ColumnFamily18 >> SSTable count: 3 >> Space used (live): 20705514 >> Space used (total): 20705514 >> Memtable Columns Count: 164 >> Memtable Data Size: 5892 >> Memtable Switch Count: 3 >> Read Count: 5605 >> Read Latency: NaN ms. >> Write Count: 318 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 2314 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 521 >> Compacted row maximum size: 1239 >> Compacted row mean size: 604 >> >> Column Family: ColumnFamily19 >> SSTable count: 1 >> Space used (live): 29516821 >> Space used (total): 29516821 >> Memtable Columns Count: 8402 >> Memtable Data Size: 268574 >> Memtable Switch Count: 3 >> Read Count: 7587671 >> Read Latency: 0.193 ms. >> Write Count: 72597 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 186 >> Compacted row maximum size: 18443 >> Compacted row mean size: 270 >> >> Column Family: ColumnFamily20 >> SSTable count: 1 >> Space used (live): 711618 >> Space used (total): 711618 >> Memtable Columns Count: 16 >> Memtable Data Size: 496 >> Memtable Switch Count: 3 >> Read Count: 454 >> Read Latency: NaN ms. >> Write Count: 97 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 385 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 238 >> Compacted row maximum size: 426 >> Compacted row mean size: 303 >> >> Column Family: ColumnFamily21 >> SSTable count: 6 >> Space used (live): 499913576 >> Space used (total): 499913576 >> Memtable Columns Count: 79019 >> Memtable Data Size: 4820159 >> Memtable Switch Count: 11 >> Read Count: 65780 >> Read Latency: NaN ms. >> Write Count: 6129833 >> Write Latency: 0.036 ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 278 >> Compacted row maximum size: 288 >> Compacted row mean size: 284 >> >> Column Family: ColumnFamily22 >> SSTable count: 3 >> Space used (live): 378024 >> Space used (total): 378024 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 724 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily23 >> SSTable count: 3 >> Space used (live): 11425 >> Space used (total): 11425 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 1 >> Read Count: 578 >> Read Latency: NaN ms. >> Write Count: 1 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> Column Family: ColumnFamily24 >> SSTable count: 2 >> Space used (live): 20577518 >> Space used (total): 20577518 >> Memtable Columns Count: 306 >> Memtable Data Size: 12731 >> Memtable Switch Count: 3 >> Read Count: 18 >> Read Latency: NaN ms. >> Write Count: 3234 >> Write Latency: 0.013 ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 465334 >> Compacted row maximum size: 4552398 >> Compacted row mean size: 2815873 >> >> Column Family: ColumnFamily25 >> SSTable count: 1 >> Space used (live): 2888 >> Space used (total): 2888 >> Memtable Columns Count: 2 >> Memtable Data Size: 68 >> Memtable Switch Count: 3 >> Read Count: 1385944 >> Read Latency: NaN ms. >> Write Count: 19 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 223 >> Compacted row maximum size: 234 >> Compacted row mean size: 230 >> >> Column Family: ColumnFamily26 >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 0 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> ---------------- >> Keyspace: system >> Read Count: 14 >> Read Latency: 123.1865 ms. >> Write Count: 6 >> Write Latency: 0.11666666666666667 ms. >> Pending Tasks: 0 >> Column Family: LocationInfo >> SSTable count: 1 >> Space used (live): 2871 >> Space used (total): 2871 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 2 >> Read Count: 2 >> Read Latency: NaN ms. >> Write Count: 6 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 1 >> Key cache size: 1 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 203 >> Compacted row maximum size: 554 >> Compacted row mean size: 378 >> >> Column Family: HintsColumnFamily >> SSTable count: 2 >> Space used (live): 4876356 >> Space used (total): 4876356 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 0 >> Read Count: 12 >> Read Latency: NaN ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 2 >> Key cache size: 2 >> Key cache hit rate: NaN >> Row cache: disabled >> Compacted row minimum size: 0 >> Compacted row maximum size: 0 >> Compacted row mean size: 0 >> >> >> >> On 02/28/2012 06:51 PM, Dan Retzlaff wrote: >>> Hi Stefan. Can you share the output of nodetool cfstats? >>> >>> On Tue, Feb 28, 2012 at 1:50 AM, Stefan Reek >>> > wrote: >>> >>> Hi All, >>> >>> We are running a 3-node cluster with Cassandra 0.6.13. >>> We are in the process of upgrading to 1.x, but can't do >>> so for a while because we can't take the cluster offline. >>> Until now 0.6.13 has run without problems, but lately we >>> are getting some performance issues. >>> We are getting timeouts on both reads and writes. >>> The cluster contains a lot of "old" data which is only >>> read very occasionally and not written to at all. >>> Is it possible for this data to negatively impact the >>> performance of the cluster? >>> If so, would it help to move this data to another >>> columnfamily/keyspace? >>> >>> Cheers, >>> >>> Stefan >>> >>> >> >> > > --------------010307060704040705000809 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks for the help Dan.
Is there anyone else that can shed some light on this issue?
For example, what are the options for reducing the time taken by the mark/remark pauses?

cheers,

Stefan

On 03/01/2012 06:51 PM, Dan Retzlaff wrote:
I've never had to deal with GC tuning since our cluster has relatively few (but large) columns. So I'll leave further comment to others, but it sounds like you're on the right track.

On Thu, Mar 1, 2012 at 2:13 AM, Stefan Reek <stefan@unitedgames.com> wrote:
Swap is disabled on the machines, so I'm sure the JVM is not swapping out.
Cassandra is the only process running on those machines, so network contention shouldn't be an issues either.
Which leaves Garbage collection.

In the logs I can see that the Garbage Collection takes a long time, for example:

GCInspector.java (line 130) GC for ConcurrentMarkSweep: 384113 ms, -84703136 reclaimed leaving 1018203472 used; max is 17208836096

But I configured the Garbage Collector to use the ConcurrentMarkSweep algorithm. As far as I understand this means that there will only be two pauses during the compaction, the mark and the remark stage.

As I also have extra garbage collection logging enabled on one node, I checked to see if the mark/remark times sometimes exceed the 10 second RPC timeout and I found some logging statements like this:

422208.422: [CMS-concurrent-mark: 0.434/20.004 secs] [Times: user=13.40 sys=0.75, real=20.00 secs]

Does this mean that the mark session took over 10 seconds and all other Cassandra threads were stopped during this
time? If so, that would explain why I sometimes get timeouts on requests and also see Dropped Messages in my logs.

Cheers,

Stefan



On 02/29/2012 07:48 PM, Dan Retzlaff wrote:
First to be clear, I'm not an expert but I suggested "cfstats" because it can show unhealthy signs. That said, yours looks okay to me: few live SSTables per column family, reasonable quantity of data... The next things I'd verify are that (1) the JVM isn't swapping out during these periods of bad performance, (2) GC completes in less than your RPC timeout (check the logs), and (3) your internode network connection isn't degraded by another application.

The fact that *writes* are timing out is interesting, since existing data should have little to no effect on those.

BTW, since ColumnFamily7 sees most of your action, you might consider whether a key/row cache is appropriate. By my count it has 2M rows and 7M reads since startup. But this fine-tuning is probably a distraction from whatever's causing your timeouts.

Hope that helps,
Dan

On Wed, Feb 29, 2012 at 5:14 AM, Stefan Reek <stefan@unitedgames.com> wrote:
Hi Dan,

Thanks for answering.
I included the output of cfstats below.
I hope you can say something about our problems with it.

cheers,

Stefan

Keyspace: Keyspace1
    Read Count: 60703419
    Read Latency: 1.1790332096286043 ms.
    Write Count: 105871791
    Write Latency: 0.019847457393065166 ms.
    Pending Tasks: 0
        Column Family: ColumnFamily1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily2
        SSTable count: 2
        Space used (live): 482828
        Space used (total): 482828
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 3
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 9
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 258
        Compacted row maximum size: 275
        Compacted row mean size: 266

        Column Family: ColumnFamily3
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily4
        SSTable count: 1
        Space used (live): 19793722
        Space used (total): 19793722
        Memtable Columns Count: 306
        Memtable Data Size: 12731
        Memtable Switch Count: 3
        Read Count: 2309
        Read Latency: NaN ms.
        Write Count: 3234
        Write Latency: 0.021 ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 465813
        Compacted row maximum size: 4784014
        Compacted row mean size: 3295196

        Column Family: ColumnFamily5
        SSTable count: 6
        Space used (live): 364775507
        Space used (total): 364775507
        Memtable Columns Count: 14193
        Memtable Data Size: 552090
        Memtable Switch Count: 3
        Read Count: 211451
        Read Latency: NaN ms.
        Write Count: 26567
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 187
        Compacted row maximum size: 333233
        Compacted row mean size: 662

        Column Family: ColumnFamily6
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily7
        SSTable count: 12
        Space used (live): 6031951162
        Space used (total): 6031951162
        Memtable Columns Count: 480647
        Memtable Data Size: 29424092
        Memtable Switch Count: 43
        Read Count: 7506944
        Read Latency: 0.761 ms.
        Write Count: 3022099
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 239
        Compacted row maximum size: 93925
        Compacted row mean size: 2737

        Column Family: ColumnFamily8
        SSTable count: 1
        Space used (live): 241104
        Space used (total): 241104
        Memtable Columns Count: 2
        Memtable Data Size: 94
        Memtable Switch Count: 3
        Read Count: 685
        Read Latency: NaN ms.
        Write Count: 143
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 254
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 254
        Compacted row maximum size: 479
        Compacted row mean size: 305

        Column Family: ColumnFamily9
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily10
        SSTable count: 3
        Space used (live): 22874666
        Space used (total): 22874666
        Memtable Columns Count: 11886
        Memtable Data Size: 519166
        Memtable Switch Count: 3
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 73970
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 161
        Compacted row maximum size: 103903
        Compacted row mean size: 11504

        Column Family: ColumnFamily11
        SSTable count: 3
        Space used (live): 6533310
        Space used (total): 6533310
        Memtable Columns Count: 8
        Memtable Data Size: 334
        Memtable Switch Count: 3
        Read Count: 5431
        Read Latency: NaN ms.
        Write Count: 197
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 653
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 262
        Compacted row maximum size: 17149
        Compacted row mean size: 359

        Column Family: ColumnFamily12
        SSTable count: 1
        Space used (live): 5504796
        Space used (total): 5504796
        Memtable Columns Count: 722
        Memtable Data Size: 5495928
        Memtable Switch Count: 3
        Read Count: 6413
        Read Latency: NaN ms.
        Write Count: 5784
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 167
        Compacted row maximum size: 5498089
        Compacted row mean size: 261974

        Column Family: ColumnFamily13
        SSTable count: 3
        Space used (live): 18208017
        Space used (total): 18208017
        Memtable Columns Count: 40
        Memtable Data Size: 1730
        Memtable Switch Count: 3
        Read Count: 4617
        Read Latency: NaN ms.
        Write Count: 132
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 1604
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 512
        Compacted row maximum size: 1776
        Compacted row mean size: 683

        Column Family: ColumnFamily14
        SSTable count: 9
        Space used (live): 1631088415
        Space used (total): 1631088415
        Memtable Columns Count: 355843
        Memtable Data Size: 15750980
        Memtable Switch Count: 1142
        Read Count: 37874988
        Read Latency: 0.830 ms.
        Write Count: 96285506
        Write Latency: 0.032 ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 599
        Compacted row maximum size: 1124
        Compacted row mean size: 861

        Column Family: ColumnFamily15
        SSTable count: 2
        Space used (live): 69991756
        Space used (total): 69991756
        Memtable Columns Count: 315483
        Memtable Data Size: 14827701
        Memtable Switch Count: 65
        Read Count: 5653028
        Read Latency: 6.888 ms.
        Write Count: 47666
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 160
        Compacted row maximum size: 5683855
        Compacted row mean size: 35031

        Column Family: ColumnFamily16
        SSTable count: 1
        Space used (live): 44605875
        Space used (total): 44605875
        Memtable Columns Count: 20273
        Memtable Data Size: 645383
        Memtable Switch Count: 3
        Read Count: 373155
        Read Latency: 0.058 ms.
        Write Count: 199847
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 209
        Compacted row maximum size: 82030
        Compacted row mean size: 617

        Column Family: ColumnFamily17
        SSTable count: 1
        Space used (live): 5609
        Space used (total): 5609
        Memtable Columns Count: 166
        Memtable Data Size: 2075
        Memtable Switch Count: 3
        Read Count: 17630
        Read Latency: NaN ms.
        Write Count: 542
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 522
        Compacted row maximum size: 563
        Compacted row mean size: 545

        Column Family: ColumnFamily18
        SSTable count: 3
        Space used (live): 20705514
        Space used (total): 20705514
        Memtable Columns Count: 164
        Memtable Data Size: 5892
        Memtable Switch Count: 3
        Read Count: 5605
        Read Latency: NaN ms.
        Write Count: 318
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 2314
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 521
        Compacted row maximum size: 1239
        Compacted row mean size: 604

        Column Family: ColumnFamily19
        SSTable count: 1
        Space used (live): 29516821
        Space used (total): 29516821
        Memtable Columns Count: 8402
        Memtable Data Size: 268574
        Memtable Switch Count: 3
        Read Count: 7587671
        Read Latency: 0.193 ms.
        Write Count: 72597
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 186
        Compacted row maximum size: 18443
        Compacted row mean size: 270

        Column Family: ColumnFamily20
        SSTable count: 1
        Space used (live): 711618
        Space used (total): 711618
        Memtable Columns Count: 16
        Memtable Data Size: 496
        Memtable Switch Count: 3
        Read Count: 454
        Read Latency: NaN ms.
        Write Count: 97
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 385
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 238
        Compacted row maximum size: 426
        Compacted row mean size: 303

        Column Family: ColumnFamily21
        SSTable count: 6
        Space used (live): 499913576
        Space used (total): 499913576
        Memtable Columns Count: 79019
        Memtable Data Size: 4820159
        Memtable Switch Count: 11
        Read Count: 65780
        Read Latency: NaN ms.
        Write Count: 6129833
        Write Latency: 0.036 ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 278
        Compacted row maximum size: 288
        Compacted row mean size: 284

        Column Family: ColumnFamily22
        SSTable count: 3
        Space used (live): 378024
        Space used (total): 378024
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 724
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily23
        SSTable count: 3
        Space used (live): 11425
        Space used (total): 11425
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 1
        Read Count: 578
        Read Latency: NaN ms.
        Write Count: 1
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: ColumnFamily24
        SSTable count: 2
        Space used (live): 20577518
        Space used (total): 20577518
        Memtable Columns Count: 306
        Memtable Data Size: 12731
        Memtable Switch Count: 3
        Read Count: 18
        Read Latency: NaN ms.
        Write Count: 3234
        Write Latency: 0.013 ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 465334
        Compacted row maximum size: 4552398
        Compacted row mean size: 2815873

        Column Family: ColumnFamily25
        SSTable count: 1
        Space used (live): 2888
        Space used (total): 2888
        Memtable Columns Count: 2
        Memtable Data Size: 68
        Memtable Switch Count: 3
        Read Count: 1385944
        Read Latency: NaN ms.
        Write Count: 19
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache: disabled
        Row cache: disabled
        Compacted row minimum size: 223
        Compacted row maximum size: 234
        Compacted row mean size: 230

        Column Family: ColumnFamily26
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------
Keyspace: system
    Read Count: 14
    Read Latency: 123.1865 ms.
    Write Count: 6
    Write Latency: 0.11666666666666667 ms.
    Pending Tasks: 0
        Column Family: LocationInfo
        SSTable count: 1
        Space used (live): 2871
        Space used (total): 2871
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 2
        Read Count: 2
        Read Latency: NaN ms.
        Write Count: 6
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 1
        Key cache size: 1
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 203
        Compacted row maximum size: 554
        Compacted row mean size: 378

        Column Family: HintsColumnFamily
        SSTable count: 2
        Space used (live): 4876356
        Space used (total): 4876356
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 12
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 2
        Key cache size: 2
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0



On 02/28/2012 06:51 PM, Dan Retzlaff wrote:
Hi Stefan. Can you share the output of nodetool cfstats?

On Tue, Feb 28, 2012 at 1:50 AM, Stefan Reek <stefan@unitedgames.com> wrote:
Hi All,

We are running a 3-node cluster with Cassandra 0.6.13.
We are in the process of upgrading to 1.x, but can't do so for a while because we can't take the cluster offline.
Until now 0.6.13 has run without problems, but lately we are getting some performance issues.
We are getting timeouts on both reads and writes.
The cluster contains a lot of "old" data which is only read very occasionally and not written to at all.
Is it possible for this data to negatively impact the performance of the cluster?
If so, would it help to move this data to another columnfamily/keyspace?

Cheers,

Stefan






--------------010307060704040705000809--