incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: TimedOutException
Date Fri, 18 Dec 2009 01:53:03 GMT
Yes.  I don't think this was in the beta2 release notes, but it will
be in for 0.5 final:
https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5/NEWS.txt

On Thu, Dec 17, 2009 at 6:43 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
> Ok I believe the problem is when I was upgrading to a newer build of
> cassandra, I was upgrading the servers one by one by restarting them.
> So at one point of time I had some nodes that were 2 days older than
> the others, and it seems to have caused the inter-node messaging to go
> haywire.
>
> I stopped all the nodes at the same time, and restarted all of them,
> and seems like the problem is fixed.
> Cheers
> Ramzi
>
>
> On Thu, Dec 17, 2009 at 8:55 AM, Ramzi Rabah <rrabah@playdom.com> wrote:
>> I added some debugging code to capture the time a read takes
>> (getColumnFamily) and the time the road trip weakRemoteRead takes.
>> The time it takes to read columns is negligible, so it doesn't seem a
>> problem with getColumnFamily. The time it takes for weakRemoteRead
>> however is > 5 seconds in some cases. So looking at some more
>> debugging output,
>> the log indicates that the packets are in the process of being sent by
>> weakRemoteRead to the correct target node, but for some reason, the
>> target node does not have any reference
>> in the log that it handled the get at all.
>>
>> Couple other things to note:
>> 1- I restarted the nodes one after another, while there was traffic
>> going to them. Don't know if that will throw off cassandra or that the
>> whole thing is a network congestion problem?
>> 2- Read stats on the keyspace level indicate NaN value for Read
>> latency which seems like a bug?
>>
>> Thanks
>> Ramzi
>>
>> On Wed, Dec 16, 2009 at 12:07 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> On Wed, Dec 16, 2009 at 12:46 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>> We are observing increasing number of TimedOutExceptions in cassandra
>>>> 0.5 trunk although the load seems fairly low (about 400 reads/writes
>>>> per second).
>>>> cfstats reports that operations are taking less than 2 ms on average.
>>>>
>>>> 2 Things I have noticed looking at the source code.
>>>>
>>>> 1- TimedOutExceptions are silently swallowed by Cassandra and not
>>>> reported in the logs even at debug level
>>>
>>> It's reported to the client.  Hardly "swallowed" :)
>>>
>>>> 2- readstats does not account for these long time running queries that
>>>> time out.
>>>
>>> Right.  But the CF-level stats do.
>>>
>>>> I'm wondering, what could be causing the system to go haywire like
>>>> this?
>>>
>>> Hard to say without more information.  One shot in the dark is that
>>> get_key_range is a major offender sometimes, as well as workloads that
>>> do lots of deletes + re-inserts for the same keys.
>>>
>>> -Jonathan
>>>
>>
>

Mime
View raw message