Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <a1625f290910201202s8d5920er49eaac2661220900@mail.gmail.com>
References: <a1625f290910191551n41983840s3edc815d38f7009c@mail.gmail.com>
	 <a1625f290910191934s52945ad8l237ae3c7338b3ede@mail.gmail.com>
	 <e06563880910191944i2546ab3emf1454e9a192705f8@mail.gmail.com>
	 <a1625f290910192017l349dc05ag5e4f9457b924f1dd@mail.gmail.com>
	 <e06563880910192020g78afda29l59ce6efd179c62ca@mail.gmail.com>
	 <a1625f290910192028m524d83acx8f218b90139c91ab@mail.gmail.com>
	 <e06563880910192030j65f5520oae3b5a7c8bb35473@mail.gmail.com>
	 <a1625f290910192045yae55d09xe403eb086b638b63@mail.gmail.com>
	 <e06563880910201139t5bd10d9te78b6a8c1c1a9fae@mail.gmail.com>
	 <a1625f290910201202s8d5920er49eaac2661220900@mail.gmail.com>
Date: Wed, 21 Oct 2009 12:01:26 -0700
Message-ID: <a1625f290910211201m1f7304b5jd4f23d484af6d943@mail.gmail.com>
Subject: Re: Encountering timeout exception when running get_key_range
From: Ramzi Rabah <rrabah@playdom.com>
To: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Jonathan I am still running into the timeout issue even after
reducing the GCGraceSeconds to 1 hour (we have tons of deletes
happening in our app). Which part of Cassandra
is responsible for deleting the tombstone records and how often does it run=
.


On Tue, Oct 20, 2009 at 12:02 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
> Thank you so much Jonathan.
>
> Data is test data so I'll just wipe it out and restart after updating
> GCGraceSeconds.
> Thanks for your help.
>
> Ray
>
> On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis <jbellis@gmail.com> wrot=
e:
>> The problem is you have a few MB of actual data and a few hundred MB
>> of tombstones (data marked deleted). =A0So what happens is get_key_range
>> spends a long, long time iterating through the tombstoned rows,
>> looking for keys that actually still exist.
>>
>> We're going to redesign this for CASSANDRA-344, but for the 0.4
>> series, you should restart with GCGraceSeconds much lower (e.g. 3600),
>> delete your old data files, and reload your data fresh. =A0(Instead of
>> reloading, you can use "nodeprobe compact" on each node to force a
>> major compaction but it will take much longer since you have so many
>> tombstones).
>>
>> -Jonathan
>>
>> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah <rrabah@playdom.com> wrote=
:
>>> Hi Jonathan:
>>>
>>> Here is the storage_conf.xml for one of the servers
>>> http://email.slicezero.com/storage-conf.xml
>>>
>>> and here is the zipped data:
>>> http://email.slicezero.com/datastoreDeletion.tgz
>>>
>>> Thanks
>>> Ray
>>>
>>>
>>>
>>>
>>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis <jbellis@gmail.com> wro=
te:
>>>> Yes, please. =A0You'll probably have to use something like
>>>> http://www.getdropbox.com/ if you don't have a public web server to
>>>> stash it temporarily.
>>>>
>>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah <rrabah@playdom.com> wro=
te:
>>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to =
you?
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis <jbellis@gmail.com> w=
rote:
>>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up
>>>>>> for me to use to reproduce over here?
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah <rrabah@playdom.com> w=
rote:
>>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned
>>>>>>> them all off.
>>>>>>> 1- I started node6 by itself and still got the problem.
>>>>>>> 2- I started node8 by itself and it ran fine (returned no keys)
>>>>>>> 3- I started node9 by itself and still got the problem.
>>>>>>> 4- I started node10 by itself and still got the problem.
>>>>>>>
>>>>>>> Ray
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis <jbellis@gmail.com>=
 wrote:
>>>>>>>> That's really strange... =A0Can you reproduce on a single-node clu=
ster?
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah <rrabah@playdom.com> =
wrote:
>>>>>>>>> The rows are very small. There are a handful of columns per row
>>>>>>>>> (approximately about 4-5 columns per row).
>>>>>>>>> Each column has a name which is a String (20-30 characters long),=
 and
>>>>>>>>> the value is an empty array of bytes (new byte[0]).
>>>>>>>>> I just use the names of the columns, and don't need to store any
>>>>>>>>> values in this Column Family.
>>>>>>>>>
>>>>>>>>> -- Ray
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis <jbellis@gmail.co=
m> wrote:
>>>>>>>>>> Can you tell me anything about the nature of your rows? =A0Many/=
few
>>>>>>>>>> columns? =A0Large/small column values?
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah <rrabah@playdom.com=
> wrote:
>>>>>>>>>>> Hi Jonathan
>>>>>>>>>>> I actually spoke too early. Now even if I restart the servers i=
t still
>>>>>>>>>>> gives a timeout exception.
>>>>>>>>>>> As far as the sstable files are, not sure which ones are the ss=
tables,
>>>>>>>>>>> but here is the list of files in the data directory that are pr=
epended
>>>>>>>>>>> with the column family name:
>>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db
>>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db
>>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db
>>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db
>>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db
>>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db
>>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db
>>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db
>>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db
>>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db
>>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db
>>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db
>>>>>>>>>>>
>>>>>>>>>>> I am not currently doing any system stat collection.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis <jbellis@gmail.=
com> wrote:
>>>>>>>>>>>> How many sstable files are in the data directories for the
>>>>>>>>>>>> columnfamily you are querying?
>>>>>>>>>>>>
>>>>>>>>>>>> How many are there after you restart and it is happy?
>>>>>>>>>>>>
>>>>>>>>>>>> Are you doing system stat collection with munin or ganglia or =
some such?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah <rrabah@playdom.c=
om> wrote:
>>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same excepti=
on when I
>>>>>>>>>>>>> call get_key_range.
>>>>>>>>>>>>> I checked all the server logs, and there is only one exceptio=
n being
>>>>>>>>>>>>> thrown by whichever server I am connecting to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis <jbellis@gmai=
l.com> wrote:
>>>>>>>>>>>>>> No, it's smart enough to avoid scanning.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah <rrabah@playdom=
.com> wrote:
>>>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code to=
 0.4.1 and
>>>>>>>>>>>>>>> will check all the logs on all the machines.
>>>>>>>>>>>>>>> Just a simple question, when you do a get_key_range and you=
 specify ""
>>>>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there are=
 too many
>>>>>>>>>>>>>>> entries, does it do a scan to find out the start or is it s=
mart enough
>>>>>>>>>>>>>>> to know what the start key is?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis <jbellis@gm=
ail.com> wrote:
>>>>>>>>>>>>>>>> You should check the other nodes for potential exceptions =
keeping them
>>>>>>>>>>>>>>>> from replying.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused by =
an old bug,
>>>>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah <rrabah@playd=
om.com> wrote:
>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am running into problems with get_key_range. I have
>>>>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml an=
d I am using
>>>>>>>>>>>>>>>>> a columnfamily that looks like
>>>>>>>>>>>>>>>>> =A0 =A0 <ColumnFamily CompareWith=3D"BytesType"
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Name=3D"DatastoreDele=
tionSchedule"
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 />
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", "Datastor=
eDeletionSchedule",
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"", "", 25, Consis=
tencyLevel.ONE);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It usually works fine but after a day or so from server w=
rites into
>>>>>>>>>>>>>>>>> this column family, I started getting
>>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 Cassandr=
a.java (line
>>>>>>>>>>>>>>>>> 770) Internal error processing get_key_range
>>>>>>>>>>>>>>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutE=
xception:
>>>>>>>>>>>>>>>>> Operation timed out.
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StoragePro=
xy.getKeyRange(StorageProxy.java:560)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.CassandraS=
erver.get_key_range(CassandraServer.java:595)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$=
Processor$get_key_range.process(Cassandra.java:766)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.Cassandra$=
Processor.process(Cassandra.java:609)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.thrift.server.TThreadPoolSer=
ver$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor=
$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor=
$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Operati=
on timed out.
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.net.AsyncResult.ge=
t(AsyncResult.java:97)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0at org.apache.cassandra.service.StoragePro=
xy.getKeyRange(StorageProxy.java:556)
>>>>>>>>>>>>>>>>> =A0 =A0 =A0 =A0... 7 more
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I still get the timeout exceptions even though the server=
s have been
>>>>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers, it=
 seems to
>>>>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incubatin=
g-0.4.0-rc2
>>>>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>