cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Dont bogart that connection my friend
Date Sat, 04 Dec 2010 19:21:47 GMT
Ah, got it.  Thanks for clearing that up!

On Sat, Dec 4, 2010 at 11:56 AM, Daniel Doubleday
<> wrote:
> Ah ok. No that was not the case.
> The client which did the long running scan didn't wait for the slowest node.
> Only other clients that asked the slow node directly were affected.
> Sorry about the confusion.
> On 04.12.10 05:44, Jonathan Ellis wrote:
>> That makes sense, but this shouldn't make requests last for the
>> timeout duration -- at quorum, it should be responding to the client
>> as soon as it gets that second-fastest reply.  If I'm understanding
>> right that this was making the response to the client block until the
>> overwhelmed node timed out, that's a bug.  What version of Cassandra
>> is this?
>> On Fri, Dec 3, 2010 at 7:27 PM, Daniel Doubleday
>> <>  wrote:
>>> Yes.
>>> I thought that would make sense, no? I guessed that the quorum read
>>> forces
>>> the slowest of the 3 nodes to keep the pace of the faster ones. But it
>>> cant.
>>> No matter how small the performance diff is. So it will just fill up.
>>> Also when saying 'practically dead' and 'never recovers' I meant for the
>>> time I kept the reads up. As soon as I stopped the scan it recovered. It
>>> just was not able to recover during the load because for that it would
>>> have
>>> to become faster that the other nodes and with full queues that just
>>> wouldn't happen.
>>> By changing the node for every read I would hit the slower node every
>>> couple
>>> of reads. This forced the client to wait for the slower node.
>>> I guess to change that behavior you would need to use something like
>>> dynamic
>>> snitch and ask only as many peer nodes as necessary to satisfy quorum and
>>> only ask other nodes when reads fail. But that would probably increase
>>> latency and cause whatever other problems. Since you probably don't want
>>> to
>>> run the cluster at a load at which the weakest node of a replication
>>> group
>>> can't keep up I don't think this is an issue at all.
>>> Just wanted to prevent others shooting their own foot as I did.
>>> On 03.12.10 23:36, Jonathan Ellis wrote:
>>>> Am I understanding correctly that you had all connections going to one
>>>> cassandra node, which caused one of the *other* nodes to die, and
>>>> spreading the connections around the cluster fixed it?
>>>> On Fri, Dec 3, 2010 at 4:00 AM, Daniel Doubleday
>>>> <>    wrote:
>>>>> Hi all
>>>>> I have found an anti pattern the other day which I wanted to share,
>>>>> although its pretty special case.
>>>>> Special case because our production cluster is somewhat strange: 3
>>>>> servers, rf = 3. We do consistent reads/writes with quorum.
>>>>> I did a long running read series (loads of reads as fast as I can) with
>>>>> one connection. Since all queries could be handled by that node the
>>>>> overall
>>>>> latency is determined by its own and the fastest second node (cause the
>>>>> quorum is satisfied with 2 reads). What will happen than is that after
>>>>> a
>>>>> couple of minutes one of the other two nodes will go in 100% io wait
>>>>> and
>>>>> will drop most of its read messages. Leaving it practically dead while
>>>>> the
>>>>> other 2 nodes keep responding at an average of ~10ms. The node that
>>>>> died was
>>>>> only a little slower ~13ms average but it will inevitably queue up
>>>>> messages.
>>>>> Average response time increases to timeout (10 secs) flat. It never
>>>>> recovers.
>>>>> It happened all the time. And it wasn't the same node that would die.
>>>>> The solution was that I return the connection to the pool and get a new
>>>>> one for every read to balance the load on the client side.
>>>>> Obviously this will not happen in a cluster where the percentage of all
>>>>> rows on one node is enough. But the same thing will probably happen if
>>>>> you
>>>>> scan by continuos tokens (meaning that you will read from the same node
>>>>> a
>>>>> long time).
>>>>> Cheers,
>>>>> Daniel Doubleday
>>>>>, Berlin

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message