cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Dont bogart that connection my friend
Date Fri, 03 Dec 2010 22:36:26 GMT
Am I understanding correctly that you had all connections going to one
cassandra node, which caused one of the *other* nodes to die, and
spreading the connections around the cluster fixed it?

On Fri, Dec 3, 2010 at 4:00 AM, Daniel Doubleday
<> wrote:
> Hi all
> I have found an anti pattern the other day which I wanted to share, although its pretty
special case.
> Special case because our production cluster is somewhat strange: 3 servers, rf = 3. We
do consistent reads/writes with quorum.
> I did a long running read series (loads of reads as fast as I can) with one connection.
Since all queries could be handled by that node the overall latency is determined by its own
and the fastest second node (cause the quorum is satisfied with 2 reads). What will happen
than is that after a couple of minutes one of the other two nodes will go in 100% io wait
and will drop most of its read messages. Leaving it practically dead while the other 2 nodes
keep responding at an average of ~10ms. The node that died was only a little slower ~13ms
average but it will inevitably queue up messages. Average response time increases to timeout
(10 secs) flat. It never recovers.
> It happened all the time. And it wasn't the same node that would die.
> The solution was that I return the connection to the pool and get a new one for every
read to balance the load on the client side.
> Obviously this will not happen in a cluster where the percentage of all rows on one node
is enough. But the same thing will probably happen if you scan by continuos tokens (meaning
that you will read from the same node a long time).
> Cheers,
> Daniel Doubleday
>, Berlin

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message