cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramzi Rabah <rra...@playdom.com>
Subject Re: Encountering timeout exception when running get_key_range
Date Tue, 20 Oct 2009 19:02:40 GMT
Thank you so much Jonathan.

Data is test data so I'll just wipe it out and restart after updating
GCGraceSeconds.
Thanks for your help.

Ray

On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> The problem is you have a few MB of actual data and a few hundred MB
> of tombstones (data marked deleted).  So what happens is get_key_range
> spends a long, long time iterating through the tombstoned rows,
> looking for keys that actually still exist.
>
> We're going to redesign this for CASSANDRA-344, but for the 0.4
> series, you should restart with GCGraceSeconds much lower (e.g. 3600),
> delete your old data files, and reload your data fresh.  (Instead of
> reloading, you can use "nodeprobe compact" on each node to force a
> major compaction but it will take much longer since you have so many
> tombstones).
>
> -Jonathan
>
> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>> Hi Jonathan:
>>
>> Here is the storage_conf.xml for one of the servers
>> http://email.slicezero.com/storage-conf.xml
>>
>> and here is the zipped data:
>> http://email.slicezero.com/datastoreDeletion.tgz
>>
>> Thanks
>> Ray
>>
>>
>>
>>
>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> Yes, please.  You'll probably have to use something like
>>> http://www.getdropbox.com/ if you don't have a public web server to
>>> stash it temporarily.
>>>
>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to you?
>>>>
>>>>
>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up
>>>>> for me to use to reproduce over here?
>>>>>
>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned
>>>>>> them all off.
>>>>>> 1- I started node6 by itself and still got the problem.
>>>>>> 2- I started node8 by itself and it ran fine (returned no keys)
>>>>>> 3- I started node9 by itself and still got the problem.
>>>>>> 4- I started node10 by itself and still got the problem.
>>>>>>
>>>>>> Ray
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>> That's really strange...  Can you reproduce on a single-node
cluster?
>>>>>>>
>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>> The rows are very small. There are a handful of columns per
row
>>>>>>>> (approximately about 4-5 columns per row).
>>>>>>>> Each column has a name which is a String (20-30 characters
long), and
>>>>>>>> the value is an empty array of bytes (new byte[0]).
>>>>>>>> I just use the names of the columns, and don't need to store
any
>>>>>>>> values in this Column Family.
>>>>>>>>
>>>>>>>> -- Ray
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>>> Can you tell me anything about the nature of your rows?
 Many/few
>>>>>>>>> columns?  Large/small column values?
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>>> Hi Jonathan
>>>>>>>>>> I actually spoke too early. Now even if I restart
the servers it still
>>>>>>>>>> gives a timeout exception.
>>>>>>>>>> As far as the sstable files are, not sure which ones
are the sstables,
>>>>>>>>>> but here is the list of files in the data directory
that are prepended
>>>>>>>>>> with the column family name:
>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db
>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db
>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db
>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db
>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db
>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db
>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db
>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db
>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db
>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db
>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db
>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db
>>>>>>>>>>
>>>>>>>>>> I am not currently doing any system stat collection.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>>>>> How many sstable files are in the data directories
for the
>>>>>>>>>>> columnfamily you are querying?
>>>>>>>>>>>
>>>>>>>>>>> How many are there after you restart and it is
happy?
>>>>>>>>>>>
>>>>>>>>>>> Are you doing system stat collection with munin
or ganglia or some such?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah
<rrabah@playdom.com> wrote:
>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still
get the same exception when I
>>>>>>>>>>>> call get_key_range.
>>>>>>>>>>>> I checked all the server logs, and there
is only one exception being
>>>>>>>>>>>> thrown by whichever server I am connecting
to.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Ray
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan
Ellis <jbellis@gmail.com> wrote:
>>>>>>>>>>>>> No, it's smart enough to avoid scanning.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi
Rabah <rrabah@playdom.com> wrote:
>>>>>>>>>>>>>> Hi Jonathan thanks for the reply,
I will update the code to 0.4.1 and
>>>>>>>>>>>>>> will check all the logs on all the
machines.
>>>>>>>>>>>>>> Just a simple question, when you
do a get_key_range and you specify ""
>>>>>>>>>>>>>> and "" for start and end, and the
limit is 25, if there are too many
>>>>>>>>>>>>>> entries, does it do a scan to find
out the start or is it smart enough
>>>>>>>>>>>>>> to know what the start key is?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM,
Jonathan Ellis <jbellis@gmail.com> wrote:
>>>>>>>>>>>>>>> You should check the other nodes
for potential exceptions keeping them
>>>>>>>>>>>>>>> from replying.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Without seeing that it's hard
to say if this is caused by an old bug,
>>>>>>>>>>>>>>> but you should definitely upgrade
to 0.4.1 either way :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51
PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am running into problems
with get_key_range. I have
>>>>>>>>>>>>>>>> OrderPreservingPartitioner
defined in storage-conf.xml and I am using
>>>>>>>>>>>>>>>> a columnfamily that looks
like
>>>>>>>>>>>>>>>>     <ColumnFamily CompareWith="BytesType"
>>>>>>>>>>>>>>>>                  
Name="DatastoreDeletionSchedule"
>>>>>>>>>>>>>>>>                  
/>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1",
"DatastoreDeletionSchedule",
>>>>>>>>>>>>>>>>                  
 "", "", 25, ConsistencyLevel.ONE);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It usually works fine but
after a day or so from server writes into
>>>>>>>>>>>>>>>> this column family, I started
getting
>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36]
2009-10-19 17:24:28,223 Cassandra.java (line
>>>>>>>>>>>>>>>> 770) Internal error processing
get_key_range
>>>>>>>>>>>>>>>> java.lang.RuntimeException:
java.util.concurrent.TimeoutException:
>>>>>>>>>>>>>>>> Operation timed out.
>>>>>>>>>>>>>>>>        at org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:560)
>>>>>>>>>>>>>>>>        at org.apache.cassandra.service.CassandraServer.get_key_range(CassandraServer.java:595)
>>>>>>>>>>>>>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_key_range.process(Cassandra.java:766)
>>>>>>>>>>>>>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:609)
>>>>>>>>>>>>>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>>>>>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>>>>>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException:
Operation timed out.
>>>>>>>>>>>>>>>>        at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
>>>>>>>>>>>>>>>>        at org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:556)
>>>>>>>>>>>>>>>>        ... 7 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I still get the timeout exceptions
even though the servers have been
>>>>>>>>>>>>>>>> idle for 2 days. When I restart
the cassandra servers, it seems to
>>>>>>>>>>>>>>>> work fine again. Any ideas
what could be wrong?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incubating-0.4.0-rc2
>>>>>>>>>>>>>>>> Not sure if this is fixed
in the 0.4.1 version
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message