cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13039) Mutation time mostly spent in LinkedBlockingQueue.put() when writing with ONE
Date Tue, 03 Jan 2017 22:29:58 GMT


Jason Brown commented on CASSANDRA-13039:

The only difference between QUORUM and ONE is the number of replica nodes that you are waiting
for acks from. If you are using a token-aware client, then it will send the request to a replica,
but that node *still* waits for the ack from the write - it's just that the ack is coming
from within the same process (and doesn't need to wait on the network for that ack).

> Mutation time mostly spent in LinkedBlockingQueue.put() when writing with ONE
> -----------------------------------------------------------------------------
>                 Key: CASSANDRA-13039
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Corentin Chary
>         Attachments: mutation-linkedlist-block.png, profiler-snapshot.nps
> On a setup with a sustained write load of 70kQPS per node and a RF of 2 it looks like
most of the mutation time is spend in OutboundTcpConnection.enqueue() -> backlog.put()
> backlog is an unbounded LinkedBlockingQueue, which means that .put() can only be blocking
if a lock is taken. I strongly suspect that this is caused by the use of drainTo() in CoalescingStrategies
which is causing contention for the producers.
> On the other hand, not using drainTo() could lead to starvation of the consumers.
> Possible solutions:
> - Allow multiple connections per size and per hosts in OutboundTcpConnectionPool
> - Switch from drainTo to multiple take()
> - Switch to ConcurrentLinkedQueue (which is lockless), also means we need active polling.
> Maybe a good solution would be something hybrid: a bounded LinkedBlockingQueue and an
unbounded ConcurrentLinkedQueue. This way you get low latency when you don't have a lot of
messages, and throughput when you do.

This message was sent by Atlassian JIRA

View raw message