Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 86994 invoked from network); 21 Oct 2009 19:01:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Oct 2009 19:01:53 -0000 Received: (qmail 88775 invoked by uid 500); 21 Oct 2009 19:01:53 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 88762 invoked by uid 500); 21 Oct 2009 19:01:53 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 88753 invoked by uid 99); 21 Oct 2009 19:01:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2009 19:01:53 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.149.207] (HELO na3sys009aog112.obsmtp.com) (74.125.149.207) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 21 Oct 2009 19:01:48 +0000 Received: from source ([209.85.216.176]) by na3sys009aob112.postini.com ([74.125.148.12]) with SMTP ID DSNKSt9ahi78dYHl9jBLLGt2byJM+lw3/Mvi@postini.com; Wed, 21 Oct 2009 12:01:28 PDT Received: by mail-px0-f176.google.com with SMTP id 6so1927683pxi.31 for ; Wed, 21 Oct 2009 12:01:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.115.113.14 with SMTP id q14mr12402450wam.178.1256151686245; Wed, 21 Oct 2009 12:01:26 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Oct 2009 12:01:26 -0700 Message-ID: Subject: Re: Encountering timeout exception when running get_key_range From: Ramzi Rabah To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Jonathan I am still running into the timeout issue even after reducing the GCGraceSeconds to 1 hour (we have tons of deletes happening in our app). Which part of Cassandra is responsible for deleting the tombstone records and how often does it run= . On Tue, Oct 20, 2009 at 12:02 PM, Ramzi Rabah wrote: > Thank you so much Jonathan. > > Data is test data so I'll just wipe it out and restart after updating > GCGraceSeconds. > Thanks for your help. > > Ray > > On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis wrot= e: >> The problem is you have a few MB of actual data and a few hundred MB >> of tombstones (data marked deleted). =A0So what happens is get_key_range >> spends a long, long time iterating through the tombstoned rows, >> looking for keys that actually still exist. >> >> We're going to redesign this for CASSANDRA-344, but for the 0.4 >> series, you should restart with GCGraceSeconds much lower (e.g. 3600), >> delete your old data files, and reload your data fresh. =A0(Instead of >> reloading, you can use "nodeprobe compact" on each node to force a >> major compaction but it will take much longer since you have so many >> tombstones). >> >> -Jonathan >> >> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah wrote= : >>> Hi Jonathan: >>> >>> Here is the storage_conf.xml for one of the servers >>> http://email.slicezero.com/storage-conf.xml >>> >>> and here is the zipped data: >>> http://email.slicezero.com/datastoreDeletion.tgz >>> >>> Thanks >>> Ray >>> >>> >>> >>> >>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis wro= te: >>>> Yes, please. =A0You'll probably have to use something like >>>> http://www.getdropbox.com/ if you don't have a public web server to >>>> stash it temporarily. >>>> >>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah wro= te: >>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to = you? >>>>> >>>>> >>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis w= rote: >>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up >>>>>> for me to use to reproduce over here? >>>>>> >>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah w= rote: >>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned >>>>>>> them all off. >>>>>>> 1- I started node6 by itself and still got the problem. >>>>>>> 2- I started node8 by itself and it ran fine (returned no keys) >>>>>>> 3- I started node9 by itself and still got the problem. >>>>>>> 4- I started node10 by itself and still got the problem. >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis = wrote: >>>>>>>> That's really strange... =A0Can you reproduce on a single-node clu= ster? >>>>>>>> >>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah = wrote: >>>>>>>>> The rows are very small. There are a handful of columns per row >>>>>>>>> (approximately about 4-5 columns per row). >>>>>>>>> Each column has a name which is a String (20-30 characters long),= and >>>>>>>>> the value is an empty array of bytes (new byte[0]). >>>>>>>>> I just use the names of the columns, and don't need to store any >>>>>>>>> values in this Column Family. >>>>>>>>> >>>>>>>>> -- Ray >>>>>>>>> >>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis wrote: >>>>>>>>>> Can you tell me anything about the nature of your rows? =A0Many/= few >>>>>>>>>> columns? =A0Large/small column values? >>>>>>>>>> >>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah wrote: >>>>>>>>>>> Hi Jonathan >>>>>>>>>>> I actually spoke too early. Now even if I restart the servers i= t still >>>>>>>>>>> gives a timeout exception. >>>>>>>>>>> As far as the sstable files are, not sure which ones are the ss= tables, >>>>>>>>>>> but here is the list of files in the data directory that are pr= epended >>>>>>>>>>> with the column family name: >>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db >>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db >>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db >>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db >>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db >>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db >>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db >>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db >>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db >>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db >>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db >>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db >>>>>>>>>>> >>>>>>>>>>> I am not currently doing any system stat collection. >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis wrote: >>>>>>>>>>>> How many sstable files are in the data directories for the >>>>>>>>>>>> columnfamily you are querying? >>>>>>>>>>>> >>>>>>>>>>>> How many are there after you restart and it is happy? >>>>>>>>>>>> >>>>>>>>>>>> Are you doing system stat collection with munin or ganglia or = some such? >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah wrote: >>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same excepti= on when I >>>>>>>>>>>>> call get_key_range. >>>>>>>>>>>>> I checked all the server logs, and there is only one exceptio= n being >>>>>>>>>>>>> thrown by whichever server I am connecting to. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Ray >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>> No, it's smart enough to avoid scanning. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code to= 0.4.1 and >>>>>>>>>>>>>>> will check all the logs on all the machines. >>>>>>>>>>>>>>> Just a simple question, when you do a get_key_range and you= specify "" >>>>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there are= too many >>>>>>>>>>>>>>> entries, does it do a scan to find out the start or is it s= mart enough >>>>>>>>>>>>>>> to know what the start key is? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>>>> You should check the other nodes for potential exceptions = keeping them >>>>>>>>>>>>>>>> from replying. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused by = an old bug, >>>>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am running into problems with get_key_range. I have >>>>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml an= d I am using >>>>>>>>>>>>>>>>> a columnfamily that looks like >>>>>>>>>>>>>>>>> =A0 =A0 >>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Name=3D"DatastoreDele= tionSchedule" >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", "Datastor= eDeletionSchedule", >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"", "", 25, Consis= tencyLevel.ONE); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It usually works fine but after a day or so from server w= rites into >>>>>>>>>>>>>>>>> this column family, I started getting >>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 Cassandr= a.java (line >>>>>>>>>>>>>>>>> 770) Internal error processing get_key_range >>>>>>>>>>>>>>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutE= xception: >>>>>>>>>>>>>>>>> Operation timed out. >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StoragePro= xy.getKeyRange(StorageProxy.java:560) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.CassandraS= erver.get_key_range(CassandraServer.java:595) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$= Processor$get_key_range.process(Cassandra.java:766) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$= Processor.process(Cassandra.java:609) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.thrift.server.TThreadPoolSer= ver$WorkerProcess.run(TThreadPoolServer.java:253) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor= $Worker.runTask(ThreadPoolExecutor.java:885) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor= $Worker.run(ThreadPoolExecutor.java:907) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619) >>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Operati= on timed out. >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.net.AsyncResult.ge= t(AsyncResult.java:97) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StoragePro= xy.getKeyRange(StorageProxy.java:556) >>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0... 7 more >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still get the timeout exceptions even though the server= s have been >>>>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers, it= seems to >>>>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incubatin= g-0.4.0-rc2 >>>>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >