Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 70850 invoked from network); 20 Oct 2009 03:49:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Oct 2009 03:49:53 -0000 Received: (qmail 41270 invoked by uid 500); 20 Oct 2009 03:49:52 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 41227 invoked by uid 500); 20 Oct 2009 03:49:52 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 41218 invoked by uid 99); 20 Oct 2009 03:49:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 03:49:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.219.205 as permitted sender) Received: from [209.85.219.205] (HELO mail-ew0-f205.google.com) (209.85.219.205) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 03:49:40 +0000 Received: by ewy1 with SMTP id 1so3821835ewy.27 for ; Mon, 19 Oct 2009 20:49:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=xmZdo3YJMxT8ylTeVZE9BNutEaE2Dc/p5bFHpcB5g0g=; b=qqH8mMLi2V2OciGvqT4mpyBWDEIL3ocwQ9Kkbc8t6IFPPKbSyy5plWWLeKDPdmmm1H qrOJyahhKUdEpCPTjli7EYtnVR4X4cu8Fc8WxDmibt8FEiIWVBFpSKY4El/LW+ytAVPW FJnDJq9ranzWi989piX7HqZ8plpMOcygO+YjU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=SSulB8p7kCj+O5PrUPBiQ1h2PahvYqgq6q+Smb4ene2GHb52ruDq5x3JCMIXsm2RG/ 718iwBAn4OSczMJZYZqJFtaXRx9eV+YaJ+sVjRaNz7xmS4CoF6T5SPj9BzlweO51djR1 awyGCMRxnp3OJtXnHkVUllGrgJiHsQe4p61oc= MIME-Version: 1.0 Received: by 10.216.87.136 with SMTP id y8mr2054845wee.70.1256010557193; Mon, 19 Oct 2009 20:49:17 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Mon, 19 Oct 2009 22:48:57 -0500 Message-ID: Subject: Re: Encountering timeout exception when running get_key_range To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Got it. I will have a look tomorrow. On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah wrote: > Hi Jonathan: > > Here is the storage_conf.xml for one of the servers > http://email.slicezero.com/storage-conf.xml > > and here is the zipped data: > http://email.slicezero.com/datastoreDeletion.tgz > > Thanks > Ray > > > > > On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis wrote= : >> Yes, please. =A0You'll probably have to use something like >> http://www.getdropbox.com/ if you don't have a public web server to >> stash it temporarily. >> >> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah wrote= : >>> Hi Jonathan the data is about 60 MB. Would you like me to send it to yo= u? >>> >>> >>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis wro= te: >>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up >>>> for me to use to reproduce over here? >>>> >>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah wro= te: >>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned >>>>> them all off. >>>>> 1- I started node6 by itself and still got the problem. >>>>> 2- I started node8 by itself and it ran fine (returned no keys) >>>>> 3- I started node9 by itself and still got the problem. >>>>> 4- I started node10 by itself and still got the problem. >>>>> >>>>> Ray >>>>> >>>>> >>>>> >>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis w= rote: >>>>>> That's really strange... =A0Can you reproduce on a single-node clust= er? >>>>>> >>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah wr= ote: >>>>>>> The rows are very small. There are a handful of columns per row >>>>>>> (approximately about 4-5 columns per row). >>>>>>> Each column has a name which is a String (20-30 characters long), a= nd >>>>>>> the value is an empty array of bytes (new byte[0]). >>>>>>> I just use the names of the columns, and don't need to store any >>>>>>> values in this Column Family. >>>>>>> >>>>>>> -- Ray >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis = wrote: >>>>>>>> Can you tell me anything about the nature of your rows? =A0Many/fe= w >>>>>>>> columns? =A0Large/small column values? >>>>>>>> >>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah = wrote: >>>>>>>>> Hi Jonathan >>>>>>>>> I actually spoke too early. Now even if I restart the servers it = still >>>>>>>>> gives a timeout exception. >>>>>>>>> As far as the sstable files are, not sure which ones are the ssta= bles, >>>>>>>>> but here is the list of files in the data directory that are prep= ended >>>>>>>>> with the column family name: >>>>>>>>> DatastoreDeletionSchedule-1-Data.db >>>>>>>>> DatastoreDeletionSchedule-1-Filter.db >>>>>>>>> DatastoreDeletionSchedule-1-Index.db >>>>>>>>> DatastoreDeletionSchedule-5-Data.db >>>>>>>>> DatastoreDeletionSchedule-5-Filter.db >>>>>>>>> DatastoreDeletionSchedule-5-Index.db >>>>>>>>> DatastoreDeletionSchedule-7-Data.db >>>>>>>>> DatastoreDeletionSchedule-7-Filter.db >>>>>>>>> DatastoreDeletionSchedule-7-Index.db >>>>>>>>> DatastoreDeletionSchedule-8-Data.db >>>>>>>>> DatastoreDeletionSchedule-8-Filter.db >>>>>>>>> DatastoreDeletionSchedule-8-Index.db >>>>>>>>> >>>>>>>>> I am not currently doing any system stat collection. >>>>>>>>> >>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis wrote: >>>>>>>>>> How many sstable files are in the data directories for the >>>>>>>>>> columnfamily you are querying? >>>>>>>>>> >>>>>>>>>> How many are there after you restart and it is happy? >>>>>>>>>> >>>>>>>>>> Are you doing system stat collection with munin or ganglia or so= me such? >>>>>>>>>> >>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah wrote: >>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same exception= when I >>>>>>>>>>> call get_key_range. >>>>>>>>>>> I checked all the server logs, and there is only one exception = being >>>>>>>>>>> thrown by whichever server I am connecting to. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis wrote: >>>>>>>>>>>> No, it's smart enough to avoid scanning. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah wrote: >>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code to 0= .4.1 and >>>>>>>>>>>>> will check all the logs on all the machines. >>>>>>>>>>>>> Just a simple question, when you do a get_key_range and you s= pecify "" >>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there are t= oo many >>>>>>>>>>>>> entries, does it do a scan to find out the start or is it sma= rt enough >>>>>>>>>>>>> to know what the start key is? >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>> You should check the other nodes for potential exceptions ke= eping them >>>>>>>>>>>>>> from replying. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused by an= old bug, >>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am running into problems with get_key_range. I have >>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml and = I am using >>>>>>>>>>>>>>> a columnfamily that looks like >>>>>>>>>>>>>>> =A0 =A0 >>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Name=3D"DatastoreDeleti= onSchedule" >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", "DatastoreD= eletionSchedule", >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"", "", 25, Consiste= ncyLevel.ONE); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It usually works fine but after a day or so from server wri= tes into >>>>>>>>>>>>>>> this column family, I started getting >>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 Cassandra.= java (line >>>>>>>>>>>>>>> 770) Internal error processing get_key_range >>>>>>>>>>>>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutExc= eption: >>>>>>>>>>>>>>> Operation timed out. >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StorageProxy= .getKeyRange(StorageProxy.java:560) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.CassandraSer= ver.get_key_range(CassandraServer.java:595) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$Pr= ocessor$get_key_range.process(Cassandra.java:766) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$Pr= ocessor.process(Cassandra.java:609) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.thrift.server.TThreadPoolServe= r$WorkerProcess.run(TThreadPoolServer.java:253) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$W= orker.runTask(ThreadPoolExecutor.java:885) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$W= orker.run(ThreadPoolExecutor.java:907) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619) >>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Operation= timed out. >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.net.AsyncResult.get(= AsyncResult.java:97) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StorageProxy= .getKeyRange(StorageProxy.java:556) >>>>>>>>>>>>>>> =A0 =A0 =A0 =A0... 7 more >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I still get the timeout exceptions even though the servers = have been >>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers, it s= eems to >>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incubating-= 0.4.0-rc2 >>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >