From Igor Katkov <>
Subject Re: Storage proxy write latency is too high
Date Mon, 05 Oct 2009 18:46:33 GMT
In case anyone is following this, here is an update:

I was able to narrow it down to Cassandra-Cassandra link. Storage proxy
latency depends on size of the key. The larger amount of data (per key) is
transfered the larger latency is. No surprise here.
Client connects to a demons "A" and sends key-value, "A" accept thrift
message, de-serialize it to an object, sees that key belongs to demons "B",
serialize it to bytes once again (internal format now) and invoke
MessagingService, which in turn writes to a socket. As soon as "B" delivers
write-acknowledgment over a different connection, the client call is let go.
Cassandra's MessagingService utilizes java nio to connect to other cassandra
daemons, all connections are uni-directionals. So in theory it should be
very fast. But it's not.

What does look suspicious is certain network usage cap, only ~4% of the
1Gbps link is used regardless of "value" size. With smaller value I get a
better throughput, with larger (200Kb) - worse.

As a temp workaround I see that client might be held responsible to
identifying what cassandra instance it should send a key to. On 200kb value
it's ~10 times faster.

On Thu, Oct 1, 2009 at 6:51 PM, Igor Katkov <> wrote:

> Hi,
> I have the following puzzle:
> Storage proxy write latency ~235ms
> CF write latency <1 ms
> I have 3 nodes in the cluster, Cassandra v.0.4. Tokens evenly distributed.
> The client connects to a node and inserts a key with ConsistencyLevel.ONE
> If it happen to be a local write operation is fast, same speed as in one
> node setup. JMX shows write latency <1 ms
> If it happens to be a remote insert StorageProxy sends it to a proper node.
> This operation is slow. JMX shows write latency ~ 235ms.
> In the same time, on remote node JMX shows same <1ms write latency. So it's
> not remote node being sluggish, it's something else.
> There are no pending tasks on remote node - JMX counters are always zero,
> network is 1Gb, idle. So I can't blame it.
> I profiled Cassandra server in JProfiler, could not find a thing. All this
> extra time is spent inside QuorumResponseHandler waiting for the condition
> to signal. Which should happen as soon as response is received.
> There is one pooled TCP connection open to remote host. Hardly a
> bottleneck, ThreadPoolExecutors looks OK.
> Any ideas why write latency it is so high?

