Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 25293 invoked from network); 2 Oct 2010 20:46:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Oct 2010 20:46:06 -0000 Received: (qmail 96442 invoked by uid 500); 2 Oct 2010 20:46:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96381 invoked by uid 500); 2 Oct 2010 20:46:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96373 invoked by uid 99); 2 Oct 2010 20:46:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Oct 2010 20:46:04 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Oct 2010 20:45:53 +0000 Received: by iwn3 with SMTP id 3so6301066iwn.31 for ; Sat, 02 Oct 2010 13:45:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.152.146 with SMTP id g18mr7770670ibw.48.1286052329862; Sat, 02 Oct 2010 13:45:29 -0700 (PDT) Sender: scode@scode.org Received: by 10.231.207.67 with HTTP; Sat, 2 Oct 2010 13:45:29 -0700 (PDT) X-Originating-IP: [213.114.157.10] In-Reply-To: References: Date: Sat, 2 Oct 2010 22:45:29 +0200 X-Google-Sender-Auth: F7x5vFtQRFMzX5MYCG-OyJH9DWE Message-ID: Subject: Re: UnavailableException when data grows From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org > And I'm still getting UnavailableException and TimedOutException when there > Cassandra daemon is doing either Compaction or Garbage collection... Have you specifically correlated this? If so, which one, or both? GC should not cause unavailable exceptions on a healthy cluster with healthy nodes. If GC:s are causing nodes to not respond for so long that they cause exceptions you may be swapping. Have you monitored to see whether you are actively swapping in/out during GC:s (or at all for that matter)? Excessive GC pause times should be logged by Cassandra i the system log, regardless of JVM options. So, check your cassandra log from messages from GCInspector about pause times. I presume you are running with default JVM paramters? WIth respect to compaction, I'm still interested, as in my original response in this thread, what your data looks like and whether compaction is CPU bound or I/O bound. It's quite possible that compaction is having an adverse effect; if it is, I would suspect it is due to disk I/O rather than CPU load (unless you are saturating your cluster so that CPU load is the dominating factor). The primary adverse effects expected by compaction, other than additional CPU load, are: * Generation of additional I/O which will directly affect normal traffic. * Effects on the operating system buffer cache may increase the I/O load resulting from normal traffic (thus making the preceding issue even more significant). Are the nodes that are slow to respond during compaction I/O bound? Check with for example "iostat -x 1" (utilization value and average queue size being the most important columns). There is some work planned/happening towards lessening the impact of compactions, and it would be of interest to know the circumstances of compaction problems that people do have. -- / Peter Schuller