Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 94139 invoked from network); 10 Nov 2009 19:23:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Nov 2009 19:23:58 -0000 Received: (qmail 87879 invoked by uid 500); 10 Nov 2009 19:23:58 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 87866 invoked by uid 500); 10 Nov 2009 19:23:58 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 87857 invoked by uid 99); 10 Nov 2009 19:23:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 19:23:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.were@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-px0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 19:23:49 +0000 Received: by pxi5 with SMTP id 5so217571pxi.12 for ; Tue, 10 Nov 2009 11:23:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=7pu8mzipo6S0QznCHTv1W8D1k7ptsuwEAtn1ARvIPDw=; b=CqQXOYZ27BVkRLvXbFXHr5RPZfbpbni27qUn2pvRqT4UtX8JPG5V/61VkO52Q+/+gh 38VFqA69pXM2ZiyAqlnH2DCfiMy1Mhq7AkbAy5XpLeAc509wTprsVFm79Hpye1C8khOi AmnDy4VZzkkjxNEjTd/AmLGWjwVNBMVO73BC4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:content-type; b=wLEfhTeldd2+nT2Eqc6xZMEF9iqGpa0eF/XLXz+zO0/Bq51vtuD2pHUOMNYh1xwYb3 K9Ni7osbBk0bPATwisiwcrDyicnns1T9E01wGvxNJyFSOgpZXDBVdt7uNxJyX6nkIq34 wifpV9MaxmrUR1fWVYD53HWiuQdW0pLsq6oYw= MIME-Version: 1.0 Received: by 10.140.125.19 with SMTP id x19mr24569rvc.47.1257881008070; Tue, 10 Nov 2009 11:23:28 -0800 (PST) Reply-To: chris@chriswere.com In-Reply-To: References: <35bb42690911092025l109b871exa58ff629d624e299@mail.gmail.com> From: Chris Were Date: Tue, 10 Nov 2009 11:23:08 -0800 Message-ID: <35bb42690911101123y795c80erb18c2091fe960ae2@mail.gmail.com> Subject: Re: Timeout Exception To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=000e0cd17a5aa7e25f0478093ed6 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd17a5aa7e25f0478093ed6 Content-Type: text/plain; charset=ISO-8859-1 There's no error on the source node other than the Timeout. It appears to be occurring across multiple CF's (the majority of which are normal columns). I don't know an exact number but some of the CF's would have ~3million rows. It seems odd that the error sometimes says received 1 response, but it still times out, as I only have one node. As for load, CPU usage is certainly not a bottleneck. "top" consistently shows ~ 10-20% waiting, Chris. On Mon, Nov 9, 2009 at 9:22 PM, Jonathan Ellis wrote: > What's causing the timeout? An error on the source node, or just > slowness? If the latter, how many rows are in your multiget? > > On Mon, Nov 9, 2009 at 10:25 PM, Chris Were wrote: > > > > I'm getting a Timeout Exception every now and again (currently every > couple > > of minutes or so). > > Using revision 833288. Quorum set to ONE. My cassandra instance has been > > running for two days and the data directory is around 16GB. I'm not sure > > what the problem is, but let me know of any tests I can do to help reduce > > the problem further. There are two variations on the exception, I have > > pasted them both below. > > ERROR [pool-1-thread-63] 2009-11-09 20:17:27,579 Cassandra.java (line > > org.apache.cassandra.service.Cassandra$Processor) Internal error > processing > > get_slice > > java.lang.RuntimeException: java.util.concurrent.TimeoutException: > Operation > > timed out - received only 0 responses from . > > at > > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:103) > > at > > > org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:177) > > at > > > org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:252) > > at > > > org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:215) > > at > > > org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:668) > > at > > > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:624) > > at > > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > Caused by: java.util.concurrent.TimeoutException: Operation timed out - > > received only 0 responses from . > > at > > > org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:79) > > at > > > org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:408) > > at > > > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:333) > > at > > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:95) > > ... 9 more > > ERROR [pool-1-thread-19] 2009-11-09 11:29:18,731 Cassandra.java (line > > org.apache.cassandra.service.Cassandra$Processor) Internal error > processing > > get_slice > > java.lang.RuntimeException: java.util.concurrent.TimeoutException: > Operation > > timed out - received only 1 responses from /10.121.217.5 . > > at > > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:103) > > at > > > org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:177) > > at > > > org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:252) > > at > > > org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:215) > > at > > > org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:668) > > at > > > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:624) > > at > > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > Caused by: java.util.concurrent.TimeoutException: Operation timed out - > > received only 1 responses from /10.121.217.5 . > > at > > > org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:79) > > at > > > org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:408) > > at > > > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:333) > > at > > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:95) > > ... 9 more > > Cheers, > > Chris > --000e0cd17a5aa7e25f0478093ed6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable There's no error on the source node other than the Timeout.
It appe= ars to be occurring across multiple CF's (the majority of which are nor= mal columns).
I don't know an exact number but some of the CF= 's would have ~3million rows.
It seems odd that the error sometimes says received 1 response, but it= still times out, as I only have one node.
As for load, CPU usage= is certainly not a bottleneck.
"top" consistently show= s ~ 10-20% waiting,

Chris.

On Mon, Nov 9, 200= 9 at 9:22 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
What's causing the timeout? =A0An error on the source node, or just
slowness? =A0If the latter, how many rows are in your multiget?

On Mon, Nov 9, 2009 at 10:25 PM, Chris Were <chris.were@gmail.com> wrote:
>
> I'm getting a Timeout Exception every now and again (currently eve= ry couple
> of minutes or so).
> Using revision 833288. Quorum set to ONE. My cassandra instance has be= en
> running for two days and the data directory is around 16GB. I'm no= t sure
> what the problem is, but let me know of any tests I can do to help red= uce
> the problem further. There are two variations on the exception, I have=
> pasted them both below.
> ERROR [pool-1-thread-63] 2009-11-09 20:17:27,579 Cassandra.java (line<= br> > org.apache.cassandra.service.Cassandra$Processor) Internal error proce= ssing
> get_slice
> java.lang.RuntimeException: java.util.concurrent.TimeoutException: Ope= ration
> timed out - received only 0 responses from =A0.
> at
> org.apache.cassandra.service.CassandraServer.readColumnFamily(Cassandr= aServer.java:103)
> at
> org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.= java:177)
> at
> org.apache.cassandra.service.CassandraServer.multigetSliceInternal(Cas= sandraServer.java:252)
> at
> org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer= .java:215)
> at
> org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cas= sandra.java:668)
> at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.jav= a:624)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPo= olServer.java:253)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:603)
> at java.lang.Thread.run(Thread.java:636)
> Caused by: java.util.concurrent.TimeoutException: Operation timed out = -
> received only 0 responses from =A0.
> at
> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseH= andler.java:79)
> at
> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java= :408)
> at
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.ja= va:333)
> at
> org.apache.cassandra.service.CassandraServer.readColumnFamily(Cassandr= aServer.java:95)
> ... 9 more
> ERROR [pool-1-thread-19] 2009-11-09 11:29:18,731 Cassandra.java (line<= br> > org.apache.cassandra.service.Cassandra$Processor) Internal error proce= ssing
> get_slice
> java.lang.RuntimeException: java.util.concurrent.TimeoutException: Ope= ration
> timed out - received only 1 responses from /10.121.217.5 .
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.CassandraServer.readColumnFamily(Cassandr= aServer.java:103)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.= java:177)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.CassandraServer.multigetSliceInternal(Cas= sandraServer.java:252)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer= .java:215)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cas= sandra.java:668)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.jav= a:624)
> =A0=A0 =A0 =A0 =A0at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPo= olServer.java:253)
> =A0=A0 =A0 =A0 =A0at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1110)
> =A0=A0 =A0 =A0 =A0at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:603)
> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636)
> Caused by: java.util.concurrent.TimeoutException: Operation timed out = -
> received only 1 responses from /10.121.217.5 .
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseH= andler.java:79)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java= :408)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.ja= va:333)
> =A0=A0 =A0 =A0 =A0at
> org.apache.cassandra.service.CassandraServer.readColumnFamily(Cassandr= aServer.java:95)
> =A0=A0 =A0 =A0 =A0... 9 more
> Cheers,
> Chris

--000e0cd17a5aa7e25f0478093ed6--