Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 19702 invoked from network); 21 Oct 2009 20:39:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Oct 2009 20:39:22 -0000 Received: (qmail 65653 invoked by uid 500); 21 Oct 2009 20:39:22 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 65639 invoked by uid 500); 21 Oct 2009 20:39:22 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 65626 invoked by uid 99); 21 Oct 2009 20:39:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2009 20:39:22 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.149.73] (HELO na3sys009aog104.obsmtp.com) (74.125.149.73) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 21 Oct 2009 20:39:10 +0000 Received: from source ([209.85.160.55]) by na3sys009aob104.postini.com ([74.125.148.12]) with SMTP ID DSNKSt9xV54ohdkuFo2BiICsUtm8OJI15zD1@postini.com; Wed, 21 Oct 2009 13:38:49 PDT Received: by pwi18 with SMTP id 18so1309710pwi.14 for ; Wed, 21 Oct 2009 13:38:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.243.14 with SMTP id q14mr2817367wah.79.1256157524322; Wed, 21 Oct 2009 13:38:44 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Oct 2009 13:38:44 -0700 Message-ID: Subject: Re: Encountering timeout exception when running get_key_range From: Ramzi Rabah To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Jonathan in what scenario will there be only one version of a row? and for this scenario, does this mean that these tombstone records will never ever be deleted? Also on a higher level, what I am trying to do is provide some garbage collection of entries in the cassandra hash that have expired.(Kind of like time-to-live functionality for records in Cassandra). Do you have any insights on how that can be accomplished. On Wed, Oct 21, 2009 at 12:30 PM, Ramzi Rabah wrote: > I opened https://issues.apache.org/jira/browse/CASSANDRA-507 > > Ray > > On Wed, Oct 21, 2009 at 12:07 PM, Jonathan Ellis wrot= e: >> The compaction code removes tombstones, and it runs whenever you have >> enough sstable fragments. >> >> I think I know what is happening -- as an optimization, if there is >> only one version of a row it will just copy it to the new sstable. >> This means it won't clean out tombstones. >> >> Can you file a bug at https://issues.apache.org/jira/browse/CASSANDRA ? >> >> -Jonathan >> >> On Wed, Oct 21, 2009 at 2:01 PM, Ramzi Rabah wrote: >>> Hi Jonathan I am still running into the timeout issue even after >>> reducing the GCGraceSeconds to 1 hour (we have tons of deletes >>> happening in our app). Which part of Cassandra >>> is responsible for deleting the tombstone records and how often does it= run. >>> >>> >>> On Tue, Oct 20, 2009 at 12:02 PM, Ramzi Rabah wrot= e: >>>> Thank you so much Jonathan. >>>> >>>> Data is test data so I'll just wipe it out and restart after updating >>>> GCGraceSeconds. >>>> Thanks for your help. >>>> >>>> Ray >>>> >>>> On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis w= rote: >>>>> The problem is you have a few MB of actual data and a few hundred MB >>>>> of tombstones (data marked deleted). =A0So what happens is get_key_ra= nge >>>>> spends a long, long time iterating through the tombstoned rows, >>>>> looking for keys that actually still exist. >>>>> >>>>> We're going to redesign this for CASSANDRA-344, but for the 0.4 >>>>> series, you should restart with GCGraceSeconds much lower (e.g. 3600)= , >>>>> delete your old data files, and reload your data fresh. =A0(Instead o= f >>>>> reloading, you can use "nodeprobe compact" on each node to force a >>>>> major compaction but it will take much longer since you have so many >>>>> tombstones). >>>>> >>>>> -Jonathan >>>>> >>>>> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah wr= ote: >>>>>> Hi Jonathan: >>>>>> >>>>>> Here is the storage_conf.xml for one of the servers >>>>>> http://email.slicezero.com/storage-conf.xml >>>>>> >>>>>> and here is the zipped data: >>>>>> http://email.slicezero.com/datastoreDeletion.tgz >>>>>> >>>>>> Thanks >>>>>> Ray >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis = wrote: >>>>>>> Yes, please. =A0You'll probably have to use something like >>>>>>> http://www.getdropbox.com/ if you don't have a public web server to >>>>>>> stash it temporarily. >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah = wrote: >>>>>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it = to you? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis wrote: >>>>>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it = up >>>>>>>>> for me to use to reproduce over here? >>>>>>>>> >>>>>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah wrote: >>>>>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turn= ed >>>>>>>>>> them all off. >>>>>>>>>> 1- I started node6 by itself and still got the problem. >>>>>>>>>> 2- I started node8 by itself and it ran fine (returned no keys) >>>>>>>>>> 3- I started node9 by itself and still got the problem. >>>>>>>>>> 4- I started node10 by itself and still got the problem. >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis wrote: >>>>>>>>>>> That's really strange... =A0Can you reproduce on a single-node = cluster? >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah wrote: >>>>>>>>>>>> The rows are very small. There are a handful of columns per ro= w >>>>>>>>>>>> (approximately about 4-5 columns per row). >>>>>>>>>>>> Each column has a name which is a String (20-30 characters lon= g), and >>>>>>>>>>>> the value is an empty array of bytes (new byte[0]). >>>>>>>>>>>> I just use the names of the columns, and don't need to store a= ny >>>>>>>>>>>> values in this Column Family. >>>>>>>>>>>> >>>>>>>>>>>> -- Ray >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis wrote: >>>>>>>>>>>>> Can you tell me anything about the nature of your rows? =A0Ma= ny/few >>>>>>>>>>>>> columns? =A0Large/small column values? >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>> Hi Jonathan >>>>>>>>>>>>>> I actually spoke too early. Now even if I restart the server= s it still >>>>>>>>>>>>>> gives a timeout exception. >>>>>>>>>>>>>> As far as the sstable files are, not sure which ones are the= sstables, >>>>>>>>>>>>>> but here is the list of files in the data directory that are= prepended >>>>>>>>>>>>>> with the column family name: >>>>>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db >>>>>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am not currently doing any system stat collection. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>>> How many sstable files are in the data directories for the >>>>>>>>>>>>>>> columnfamily you are querying? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> How many are there after you restart and it is happy? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are you doing system stat collection with munin or ganglia = or some such? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same exce= ption when I >>>>>>>>>>>>>>>> call get_key_range. >>>>>>>>>>>>>>>> I checked all the server logs, and there is only one excep= tion being >>>>>>>>>>>>>>>> thrown by whichever server I am connecting to. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>>>>> No, it's smart enough to avoid scanning. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code= to 0.4.1 and >>>>>>>>>>>>>>>>>> will check all the logs on all the machines. >>>>>>>>>>>>>>>>>> Just a simple question, when you do a get_key_range and = you specify "" >>>>>>>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there = are too many >>>>>>>>>>>>>>>>>> entries, does it do a scan to find out the start or is i= t smart enough >>>>>>>>>>>>>>>>>> to know what the start key is? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis wrote: >>>>>>>>>>>>>>>>>>> You should check the other nodes for potential exceptio= ns keeping them >>>>>>>>>>>>>>>>>>> from replying. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused = by an old bug, >>>>>>>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way := ) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah wrote: >>>>>>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am running into problems with get_key_range. I have >>>>>>>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml= and I am using >>>>>>>>>>>>>>>>>>>> a columnfamily that looks like >>>>>>>>>>>>>>>>>>>> =A0 =A0 >>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Name=3D"DatastoreD= eletionSchedule" >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", "Datas= toreDeletionSchedule", >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"", "", 25, Con= sistencyLevel.ONE); >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It usually works fine but after a day or so from serve= r writes into >>>>>>>>>>>>>>>>>>>> this column family, I started getting >>>>>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 Cassa= ndra.java (line >>>>>>>>>>>>>>>>>>>> 770) Internal error processing get_key_range >>>>>>>>>>>>>>>>>>>> java.lang.RuntimeException: java.util.concurrent.Timeo= utException: >>>>>>>>>>>>>>>>>>>> Operation timed out. >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Storage= Proxy.getKeyRange(StorageProxy.java:560) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassand= raServer.get_key_range(CassandraServer.java:595) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassand= ra$Processor$get_key_range.process(Cassandra.java:766) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassand= ra$Processor.process(Cassandra.java:609) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.thrift.server.TThreadPool= Server$WorkerProcess.run(TThreadPoolServer.java:253) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecu= tor$Worker.runTask(ThreadPoolExecutor.java:885) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecu= tor$Worker.run(ThreadPoolExecutor.java:907) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619= ) >>>>>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Oper= ation timed out. >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.net.AsyncResult= .get(AsyncResult.java:97) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Storage= Proxy.getKeyRange(StorageProxy.java:556) >>>>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0... 7 more >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I still get the timeout exceptions even though the ser= vers have been >>>>>>>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers,= it seems to >>>>>>>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incuba= ting-0.4.0-rc2 >>>>>>>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >