cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Read Latency
Date Wed, 06 Oct 2010 21:14:49 GMT
Thats a lot of questions, I'll try to answer some...

Read/Write latency as reported for a CF is the time taken to perform a local read on that
node. 

Read/Write latency reported on the o.a.c.service.StorageProxy are the time taken to process
a complete request, including local and remote reads when CL is > ONE. It includes the
time taken to compare the values read from the different nodes, which may include running
the read *again* to request full values from the nodes (rather than digest values). 

With logging at DEBUG level these messages in the log may give you some insight into whats
happening (Most of this is guess work on my part, others please correct me if I am wrong)

-  strongread/weakread reading digest / data 
  Tells you a read with CL > ONE is starting, digest or data says if the node will return
the full data or a hash
  Lines with weakread are CL ONE reads. 

- quorumResponseHandler:
Tells you the requested CL of nodes have responded to the coordinator

- Digest mismatch:
  Tells you the nodes did not agree and coordinator needs to sort it out. I think this
will result in Read Repair and the full data (not digest) being read from each node. 

So the read path will be slowed down if a RR is needed. This can be helped by nodetool repair.
It can also be slowed if there are a lot fo SSTables on disk, nodetool compact if your friend
there. 

Reads to multiple nodes are done in parallel, not serial. DO not know enough about the network
parts to help you out there. 

Hope that helps you dig a little deeper.
Aaron


On 07 Oct, 2010,at 02:55 AM, Wayne <wav100@gmail.com> wrote:

I have been seeing some strange trends in read latency that I wanted to throw out there to
find some explanations. We are running .6.5 in a 10 node cluster rf=3. We find that the read
latency reported by the cfstats is always about 1/4 of the actual time it takes to get the
data back to the python client. We are not using any higher level clients, and we usually
are doing Quorum reads (rf=3). If we ask for 1 copy (vs. 2 for Quorum) it is around 2x the
time reported in cfstats. This is true whether we have a .8 ms read or a 5000 ms read. It
is always around 4x the time for a Quorum read and 2x the time for a single value read. This
tells me that much of the time waiting for a read has nothing to do with disk random read
latency. This is contrary to what is expected. 

What is that extra time being used for? Waiting 2 ms for a read value to the client when the
value is retrieved in 1ms leaves 1ms that is unexplainable. Is the node being requested by
the client doing some "work" that equal the time spent by the node actually serving up the
data? Is this the thrift server packaging up the response to the client?

Are reads really more CPU bound? We have lower end CPUs in our nodes, is that part of the
cause?

What is cfstats actually reporting? Is it not really reporting on ALL of the time required
to service a read? I assume is not reporting the time to send the result to the requesting
node.

How much of this time is network time? Would Infiniband or a lower latency network architecture
reduce any of these times? If we want to reduce a 2 ms read to a 1ms read what will help us
get there? We have cached keys which then gives us a cfstats read latency < 1ms (~.85)
but it still takes 2ms to get to the client (single read).

Why does a quorum read double everything? It seems quorum reads are serialized and not parallel.
Is that true and if so why? Obviously it takes more time to get 2 values and compare then
get one value but if that is always 2x+ then the adjustable consistency of Cassandra comes
at a very high price. 

Any other suggestions for decreasing read latency? Faster disks don't seem as useful as faster
CPUs. We have worked hard to reduce the cfstats reported read latency and have been successful.
How can we reduce the time from there to the requesting client? What is the anatomy of a read
from client request to result? Where does the time usually go and what can help speed each
step up? Caching is the obvious answer but assume we are already caching what can be cached
(keys).

Thanks in advance for any advice or explanations anyone might have.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message