Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 65918 invoked from network); 27 Sep 2010 21:55:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Sep 2010 21:55:51 -0000 Received: (qmail 83831 invoked by uid 500); 27 Sep 2010 21:55:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83777 invoked by uid 500); 27 Sep 2010 21:55:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83769 invoked by uid 99); 27 Sep 2010 21:55:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 21:55:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aichrana@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 21:55:42 +0000 Received: by yxl31 with SMTP id 31so2145461yxl.31 for ; Mon, 27 Sep 2010 14:55:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=Ltcqhy/NSgFYfF+AAJajl6P1EPdzon3gCOJrfba8YRU=; b=Ief4AuwQDWJUTZ8+R0s9U75fd1wMg0okx9jM/aZHqAa/gz1oyFAriiTYxz8QzcaQpD Mq6fdMGnY6vYhEF3d4onAwAN4c52o8D9ncFdgkHLmVm+nL1/ETOCHVyhVhRFmpxqdpGK 45iP/TES+EjVcwkt9lFR0iQDVTsFp6zhJ2+gI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ApJYu3P6V9YV9w4Ad1BeE+DZ0PzePvmfsLCaTV45Pv/aTdpI97r7VxLkTKXokaBgA9 HUiERig7HtZdagli5vGWe/qLpnXeCj+Se9wPT9L9rLhGkPXPWB4RSw8ih9LGQPf2G/4Y /b9wSoegEcRc+Ykrg2qaJzlVjpOYTCXedbk9Q= MIME-Version: 1.0 Received: by 10.151.101.3 with SMTP id d3mr9913286ybm.269.1285624520870; Mon, 27 Sep 2010 14:55:20 -0700 (PDT) Received: by 10.151.107.10 with HTTP; Mon, 27 Sep 2010 14:55:20 -0700 (PDT) In-Reply-To: References: Date: Mon, 27 Sep 2010 14:55:20 -0700 Message-ID: Subject: Re: UnavailableException when data grows From: Rana Aich To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00151750eea2e191d4049144c859 X-Virus-Checked: Checked by ClamAV on apache.org --00151750eea2e191d4049144c859 Content-Type: text/plain; charset=ISO-8859-1 Hi Peter, Thanks for your detailed query... I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the name suggests KVSLOWs have low diskspace ~ 350GB Whereas KVSHIGHs have 1.5 terabytes. Yet my nodetool shows the following: 192.168.202.202Down 319.94 GB 7200044730783885730400843868815072654 |<--| 192.168.202.4 Up 382.39 GB 23719654286404067863958492664769598669 | ^ 192.168.202.2 Up 106.81 GB 36701505058375526444137310055285336988 v | 192.168.202.3 Up 149.81 GB 65098486053779167479528707238121707074 | ^ 192.168.202.201Up 154.72 GB 79420606800360567885560534277526521273 v | 192.168.202.204Up 72.91 GB 85219217446418416293334453572116009608 | ^ 192.168.202.1 Up 29.78 GB 87632302962564279114105239858760976120 v | 192.168.202.203Up 9.35 GB 87790520647700936489181912967436646309 |-->| As you can see one of our KVSLOW box is already down. Its 100% full. Whereas boxes having 1.5 terabytes have only 29.78 GB (192.168.202.1 )! I'm using RandomPartitioner. When I run the client program the Cassandra Daemon takes around 85-130% CPU. Regards, Rana On Mon, Sep 27, 2010 at 2:31 PM, Peter Schuller wrote: > > How can I handle this kind of situation? > > In terms of surviving the problem, a re-try on the client side might > help assuming the problem is temporary. > > However, certainly the fact that you're seeing an issue to begin with > is interesting, and the way to avoid it would depend on what the > problem is. My understanding is that the UnavailableException > indicates that the node you are talking to was unable to read > form/write to a sufficient number of nodes to satisfy your consistency > level. Presumably either because individual requests failed to return > in time, or because the node considers other nodes to be flat out > down. > > Can you correlate these issues with server-side activity on the nodes, > such as background compaction, commitlog rotation or memtable > flushing? Do you see your nodes saying that other nodes in the cluster > are "DOWN" and "UP" (flapping)? > > How large is the data set in total (in terms of sstable size on disk), > and how much memory do you have in your machines (going to page > cache)? > > Have you observed the behavior of your nodes during compaction; in > particular whether compaction is CPU bound or I/O bound? (That would > tend to depend on data; generally the larger the individual values the > more disk bound you'd tend to be.) > > Just trying to zero in on what the likely root cause is in this case. > > -- > / Peter Schuller > --00151750eea2e191d4049144c859 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Peter,

Thanks for your detailed query...
I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the= name suggests KVSLOWs have low diskspace ~ 350GB
=A0Whereas KVSH= IGHs have 1.5 terabytes.

Yet my nodetool shows the following:
192= .168.202.202Down =A0 =A0 =A0 319.94 GB =A0 =A0 7200044730783885730400843868= 815072654 =A0 =A0 =A0|<--|
192.168.202.4 Up =A0 =A0 =A0 =A0 38= 2.39 GB =A0 =A0 23719654286404067863958492664769598669 =A0 =A0 | =A0 ^
192.168.202.2 Up =A0 =A0 =A0 =A0 106.81 GB =A0 =A0 3670150505837552644= 4137310055285336988 =A0 =A0 v =A0 |
192.168.202.3 Up =A0 =A0 =A0 = =A0 149.81 GB =A0 =A0 65098486053779167479528707238121707074 =A0 =A0 | =A0 = ^
192.168.202.201Up =A0 =A0 =A0 =A0 154.72 GB =A0 =A0 79420606800= 360567885560534277526521273 =A0 =A0 v =A0 |
192.168.202.204Up =A0 =A0 =A0 =A0 72.91 GB =A0 =A0 =A08521921744641841= 6293334453572116009608 =A0 =A0 | =A0 ^
192.168.202.1 Up =A0 =A0 = =A0 =A0 29.78 GB =A0 =A0 =A087632302962564279114105239858760976120 =A0 =A0 = v =A0 |
192.168.202.203Up =A0 =A0 =A0 =A0 9.35 GB =A0 =A0 =A0 877= 90520647700936489181912967436646309 =A0 =A0 |-->|

As you can see one of our KVSLOW box is already d= own. Its 100% full. Whereas boxes having 1.5 terabytes have only 29.78 GB (= 192.168.202.1 )! I'm using RandomPartitioner. When I run the client pro= gram the Cassandra Daemon takes around 85-130% CPU.=A0

Regards,

Rana



On Mon, Sep 27, 2010= at 2:31 PM, Peter Schuller <peter.schuller@infidyne.com> wrote:
> How can I handle thi= s kind of situation?

In terms of surviving the problem, a re-try on the client side might<= br> help assuming the problem is temporary.

However, =A0certainly the fact that you're seeing an issue to begin wit= h
is interesting, and the way to avoid it would depend on what the
problem is. My understanding is that the UnavailableException
indicates that the node you are talking to was unable to read
form/write to a sufficient number of nodes to satisfy your consistency
level. Presumably either because individual requests failed to return
in time, or because the node considers other nodes to be flat out
down.

Can you correlate these issues with server-side activity on the nodes,
such as background compaction, commitlog rotation or memtable
flushing? Do you see your nodes saying that other nodes in the cluster
are "DOWN" and "UP" (flapping)?

How large is the data set in total (in terms of sstable size on disk),
and how much memory do you have in your machines (going to page
cache)?

Have you observed the behavior of your nodes during compaction; in
particular whether compaction is CPU bound or I/O bound? (That would
tend to depend on data; generally the larger the individual values the
more disk bound you'd tend to be.)

Just trying to zero in on what the likely root cause is in this case.

--
/ Peter Schuller

--00151750eea2e191d4049144c859--