Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <AANLkTi=JNgL==aZSjkNbUnzhkw=VHjLWPEh6BqwB6HkW@mail.gmail.com>
References: <AANLkTik=Je5VPgKgQyU6TtkS_K5FFdZTWesu_n3rMs0W@mail.gmail.com>
	<AANLkTi=Za-XGVtgC1_se1TcrM3-=XR3_JF=6MAPmO6h_@mail.gmail.com>
	<AANLkTim0=gFU716cM8sP4Z5rwwj3gTM_o9=HAwF7r3yj@mail.gmail.com>
	<loom.20100928T084326-98@post.gmane.org>
	<AANLkTimUqKGvHw1yPQKTVx93iT2OhLdTCBMeBqm42=sU@mail.gmail.com>
	<AANLkTi=MTzSShRW81LZZjGPr7Lc6cwPq9M8Rk5Bn_yHL@mail.gmail.com>
	<AANLkTi=+OcCd4tZtps-CN8_w_rN8Uh2c7pCf2_qRvyXj@mail.gmail.com>
	<AANLkTi=JNgL==aZSjkNbUnzhkw=VHjLWPEh6BqwB6HkW@mail.gmail.com>
Date: Sat, 2 Oct 2010 22:45:29 +0200
Message-ID: <AANLkTikFD-DJ2hUNK5iw=YA-LRwDm-LzSLXhoaFNFhR4@mail.gmail.com>
Subject: Re: UnavailableException when data grows
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> And I'm still getting UnavailableException and TimedOutException when there
> Cassandra daemon is doing either Compaction or Garbage collection...

Have you specifically correlated this? If so, which one, or both?

GC should not cause unavailable exceptions on a healthy cluster with
healthy nodes. If GC:s are causing nodes to not respond for so long
that they cause exceptions you may be swapping. Have you monitored to
see whether you are actively swapping in/out during GC:s (or at all
for that matter)? Excessive GC pause times should be logged by
Cassandra i the system log, regardless of JVM options. So, check your
cassandra log from messages from GCInspector about pause times.

I presume you are running with default JVM paramters?

WIth respect to compaction, I'm still interested, as in my original
response in this thread, what your data looks like and whether
compaction is CPU bound or I/O bound. It's quite possible that
compaction is having an adverse effect; if it is, I would suspect it
is due to disk I/O rather than CPU load (unless you are saturating
your cluster so that CPU load is the dominating factor). The primary
adverse effects expected by compaction, other than additional CPU
load, are:

* Generation of additional I/O which will directly affect normal traffic.
* Effects on the operating system buffer cache may increase the I/O
load resulting from normal traffic (thus making the preceding issue
even more significant).

Are the nodes that are slow to respond during compaction I/O bound?
Check with for example "iostat -x 1" (utilization value and average
queue size  being the most important columns).

There is some work planned/happening towards lessening the impact of
compactions, and it would be of interest to know the circumstances of
compaction problems that people do have.

-- 
/ Peter Schuller