Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of scode@scode.org designates
 74.125.82.172 as permitted sender)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: 
 <CAB5AukHoXUAfpeATwk72k8M6JNY6MGNueCcNDv5GBwcQVtULxQ@mail.gmail.com>
References: 
 <CAB5AukH7GngEBRe+_2UgAmQX46CLU4GAGT+mdEOM5gbdkLCH4A@mail.gmail.com>
	<201202131403484201571@jike.com>
	<CAB5AukHTwbP0w7TG4Xv+EG_uFMzqQbQ6S6Me5imCVG8ZYZsprQ@mail.gmail.com>
	<CADVHTB-xuvtpYV9mm20bSk7NU8sGh020mNd7DD4DviCAX_ySAA@mail.gmail.com>
	<CAB5AukHoXUAfpeATwk72k8M6JNY6MGNueCcNDv5GBwcQVtULxQ@mail.gmail.com>
Date: Mon, 13 Feb 2012 09:21:17 +0100
Message-ID: 
 <CAO5xsd0P=8ZFBpiNuNOk1mdZi-7JFCMG=qhKpq54HEKJ4sc6iw@mail.gmail.com>
Subject: Re: keycache persisted to disk ?
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> I actually has the opposite 'problem'. I have a pair of servers that have
> been static since mid last week, but have seen performance vary
> significantly (x10) for exactly the same query. I=C2=A0hypothesised=C2=A0=
it was
> various caches so I shut down Cassandra, flushed the O/S buffer cache and
> then bought it back up. The performance wasn't significantly different to
> the pre-flush=C2=A0performance

I don't get this thread at all :)

Why would restarting with clean caches be expected to *improve*
performance? And why is key cache loading involved other than to delay
start-up and hopefully pre-populating caches for better (not worse)
performance?

If you want to figure out why queries seem to be slow relative to
normal, you'll need to monitor the behavior of the nodes. Look at disk
I/O statistics primarily (everyone reading this running Cassandra who
aren't intimately familiar with "iostat -x -k 1" should go and read up
on it right away; make sure you understand the utilization and avg
queue size columns), CPU usage, weather compaction is happening, etc.

One easy way to see sudden bursts of poor behavior is to be heavily
reliant on cache, and then have sudden decreases in performance due to
compaction evicting data from page cache while also generating more
I/O.

But that's total speculation. It is also the case that you cannot
expect consistent performance on EC2 and that might be it.

But my #1 advise: Log into the node while it is being slow, and
observe. Figure out what the bottleneck is. iostat, top, nodetool
tpstats, nodetool netstats, nodetool compactionstats.

--=20
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)