cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: propertyfilesnitch problem
Date Thu, 10 Nov 2011 07:07:47 GMT
> 2. With the same setup, after each period as defined by dynamic_snitch_reset_interval_in_ms,
the LOCAL_QUORUM performance greatly degrades before drastically improving again within a
minute.

This part sounds to me like one or more nodes in the cluster are
either broken and not responding at all, or overloaded. Restarts will
tend to temporarily cause additional pressure on nodes (particularly
I/O due to cache eviction issues).

Because the dynamic snitch won't ever know that the node is slow
(after a reset) until requests start actually timing out, it can be up
to rpc_timeout second before it gets snitched away. That sounds like
what you're seeing. On ever reset, an rpc_timeout period of poor
latency for clients.

Is rpc_timeout 60 seconds?

> 4. With dynamic snitch turned on, QUORUM operations' performance is about the same as
using LOCAL_QUORUM when the dynamic snitch is off or the first minute after a restart with
the snitch turned on.

This is strange, unless it is co-incidental.

Can you be more specific about the performance characteristics you're
seeing when degraded? For example:

* High latency, or timeouts?
* Are you getting Unavailable exceptions?
* Are you maintaining the same overall throughput or is there a
feedback mechanism such that when queries have high latency the
request rate decreases?
* Which data points are you using to consider something degraded?
What's matching in the QUORUM and LOCAL_QUOROM w/o dynsnitch cases?
-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Mime
View raw message