incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Encountering timeout exception when running get_key_range
Date Tue, 20 Oct 2009 03:48:57 GMT
Got it.  I will have a look tomorrow.

On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
> Hi Jonathan:
>
> Here is the storage_conf.xml for one of the servers
> http://email.slicezero.com/storage-conf.xml
>
> and here is the zipped data:
> http://email.slicezero.com/datastoreDeletion.tgz
>
> Thanks
> Ray
>
>
>
>
> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> Yes, please.  You'll probably have to use something like
>> http://www.getdropbox.com/ if you don't have a public web server to
>> stash it temporarily.
>>
>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to you?
>>>
>>>
>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up
>>>> for me to use to reproduce over here?
>>>>
>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned
>>>>> them all off.
>>>>> 1- I started node6 by itself and still got the problem.
>>>>> 2- I started node8 by itself and it ran fine (returned no keys)
>>>>> 3- I started node9 by itself and still got the problem.
>>>>> 4- I started node10 by itself and still got the problem.
>>>>>
>>>>> Ray
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>> That's really strange...  Can you reproduce on a single-node cluster?
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>> The rows are very small. There are a handful of columns per row
>>>>>>> (approximately about 4-5 columns per row).
>>>>>>> Each column has a name which is a String (20-30 characters long),
and
>>>>>>> the value is an empty array of bytes (new byte[0]).
>>>>>>> I just use the names of the columns, and don't need to store
any
>>>>>>> values in this Column Family.
>>>>>>>
>>>>>>> -- Ray
>>>>>>>
>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>> Can you tell me anything about the nature of your rows?  Many/few
>>>>>>>> columns?  Large/small column values?
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>> Hi Jonathan
>>>>>>>>> I actually spoke too early. Now even if I restart the
servers it still
>>>>>>>>> gives a timeout exception.
>>>>>>>>> As far as the sstable files are, not sure which ones
are the sstables,
>>>>>>>>> but here is the list of files in the data directory that
are prepended
>>>>>>>>> with the column family name:
>>>>>>>>> DatastoreDeletionSchedule-1-Data.db
>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db
>>>>>>>>> DatastoreDeletionSchedule-1-Index.db
>>>>>>>>> DatastoreDeletionSchedule-5-Data.db
>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db
>>>>>>>>> DatastoreDeletionSchedule-5-Index.db
>>>>>>>>> DatastoreDeletionSchedule-7-Data.db
>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db
>>>>>>>>> DatastoreDeletionSchedule-7-Index.db
>>>>>>>>> DatastoreDeletionSchedule-8-Data.db
>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db
>>>>>>>>> DatastoreDeletionSchedule-8-Index.db
>>>>>>>>>
>>>>>>>>> I am not currently doing any system stat collection.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>>>> How many sstable files are in the data directories
for the
>>>>>>>>>> columnfamily you are querying?
>>>>>>>>>>
>>>>>>>>>> How many are there after you restart and it is happy?
>>>>>>>>>>
>>>>>>>>>> Are you doing system stat collection with munin or
ganglia or some such?
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get
the same exception when I
>>>>>>>>>>> call get_key_range.
>>>>>>>>>>> I checked all the server logs, and there is only
one exception being
>>>>>>>>>>> thrown by whichever server I am connecting to.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Ray
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis
<jbellis@gmail.com> wrote:
>>>>>>>>>>>> No, it's smart enough to avoid scanning.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah
<rrabah@playdom.com> wrote:
>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will
update the code to 0.4.1 and
>>>>>>>>>>>>> will check all the logs on all the machines.
>>>>>>>>>>>>> Just a simple question, when you do a
get_key_range and you specify ""
>>>>>>>>>>>>> and "" for start and end, and the limit
is 25, if there are too many
>>>>>>>>>>>>> entries, does it do a scan to find out
the start or is it smart enough
>>>>>>>>>>>>> to know what the start key is?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan
Ellis <jbellis@gmail.com> wrote:
>>>>>>>>>>>>>> You should check the other nodes
for potential exceptions keeping them
>>>>>>>>>>>>>> from replying.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without seeing that it's hard to
say if this is caused by an old bug,
>>>>>>>>>>>>>> but you should definitely upgrade
to 0.4.1 either way :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM,
Ramzi Rabah <rrabah@playdom.com> wrote:
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am running into problems with
get_key_range. I have
>>>>>>>>>>>>>>> OrderPreservingPartitioner defined
in storage-conf.xml and I am using
>>>>>>>>>>>>>>> a columnfamily that looks like
>>>>>>>>>>>>>>>     <ColumnFamily CompareWith="BytesType"
>>>>>>>>>>>>>>>                   Name="DatastoreDeletionSchedule"
>>>>>>>>>>>>>>>                   />
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1",
"DatastoreDeletionSchedule",
>>>>>>>>>>>>>>>                    "",
"", 25, ConsistencyLevel.ONE);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It usually works fine but after
a day or so from server writes into
>>>>>>>>>>>>>>> this column family, I started
getting
>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19
17:24:28,223 Cassandra.java (line
>>>>>>>>>>>>>>> 770) Internal error processing
get_key_range
>>>>>>>>>>>>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>>>>>>>>>>>>>> Operation timed out.
>>>>>>>>>>>>>>>        at org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:560)
>>>>>>>>>>>>>>>        at org.apache.cassandra.service.CassandraServer.get_key_range(CassandraServer.java:595)
>>>>>>>>>>>>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_key_range.process(Cassandra.java:766)
>>>>>>>>>>>>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:609)
>>>>>>>>>>>>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>>>>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>>>>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException:
Operation timed out.
>>>>>>>>>>>>>>>        at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
>>>>>>>>>>>>>>>        at org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:556)
>>>>>>>>>>>>>>>        ... 7 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still get the timeout exceptions
even though the servers have been
>>>>>>>>>>>>>>> idle for 2 days. When I restart
the cassandra servers, it seems to
>>>>>>>>>>>>>>> work fine again. Any ideas what
could be wrong?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By the way, I am using version:apache-cassandra-incubating-0.4.0-rc2
>>>>>>>>>>>>>>> Not sure if this is fixed in
the 0.4.1 version
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message