incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@datastax.com>
Subject Re: read request distribution
Date Sat, 10 Nov 2012 23:15:50 GMT
When you read at quorum, a normal read query will be sent to one replica
(possibly the same node that's coordinating) and a digest query will be
sent to *one* other replica, not both.  Which replicas get picked for these
is determined by the dynamic snitch, which will favor replicas that are
responding with the lowest latency.  That's why you'll see more queries
going to replicas with lower latencies.

The Read Count number in nodetool cfstats is for local reads, not
coordination of a read request.


On Fri, Nov 9, 2012 at 8:16 PM, Wei Zhu <wz1975@yahoo.com> wrote:

> I think the row whose row key falls into the token range of the high
> latency node is likely to have more columns than the other nodes.  I have
> three nodes with RF = 3, so all the nodes have all the data. And CL =
> Quorum, meaning each request is sent to all three nodes and response is
> sent back to client when two of them respond. What exactly does "Read
> Count" from "nodetool cfstats" mean then, should it be the same across all
> the nodes? I checked with Hector, it uses Round Robin LB strategy. And I
> also tested writes, and the writes are distributed across the cluster
> evenly. Below is the output from nodetool. Any one has a clue what might
> happened?
>
> Node1:
> Read Count: 318679
> Read Latency: 72.47641436367003 ms.
> Write Count: 158680
> Write Latency: 0.07918750315099571 ms.
> Node 2:
> Read Count: 251079 Read Latency: 86.91948475579399 ms. Write Count: 158450
> Write Latency: 0.1744694540864626 ms.
> Node 3:
> Read Count: 149876 Read Latency: 168.14125553123915 ms. Write Count:
> 157896 Write Latency: 0.06468631250949992 ms.
>
>  nodetool ring
> Address         DC          Rack        Status State   Load
>  Effective-Ownership Token
>
>                  113427455640312821154458202477256070485
> 10.1.3.152      datacenter1 rack1       Up     Normal  35.85 GB
>  100.00%             0
> 10.1.3.153      datacenter1 rack1       Up     Normal  35.86 GB
>  100.00%             56713727820156410577229101238628035242
> 10.1.3.155      datacenter1 rack1       Up     Normal  35.85 GB
>  100.00%             113427455640312821154458202477256070485
>
>
> Keyspace: benchmark:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:3]
>
> I am really confused by the Read Count number from nodetool cfstats
>
> Really appreciate any hints.
> -Wei
>
>   ------------------------------
> *From:* Wei Zhu <wz1975@yahoo.com>
> *To:* Cassandr usergroup <user@cassandra.apache.org>
> *Sent:* Thursday, November 8, 2012 9:37 PM
> *Subject:* read request distribution
>
> Hi All,
> I am doing a benchmark on a Cassandra. I have a three node cluster with
> RF=3. I generated 6M rows with sequence  number from 1 to 6m, so the rows
> should be evenly distributed among the three nodes disregarding the
> replicates.
> I am doing a benchmark with read only requests, I generate read request
> for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports
> that one node has only half the requests as the other one and the third
> node sits in the middle. So the ratio is like 2:3:4. The node with the most
> read requests actually has the smallest latency and the one with the least
> read requests reports the largest latency. The difference is pretty big,
> the fastest is almost double the slowest.
> All three nodes have the exactly the same hardware and the data size on
> each node are the same since the RF is three and all of them have the
> complete data. I am using Hector as client and the random read request are
> in millions. I can't think of a reasonable explanation.  Can someone please
> shed some lights?
>
> Thanks.
> -Wei
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Mime
View raw message