Anthony, We used the Ec2Snitch for one sets of runs, but for another set we're using PropertyFileSnitch.

With the PropertyFileSnitch we see:

Address DC Rack Status State Load Owns Token
                                                                               85070591730234615865843651857942052865
192.168.2.1 us-east 1b Up Normal 60.59 MB 50.00% 0
192.168.2.6 us-west 1c Up Normal 26.5 MB 0.00% 1
192.168.2.2 us-east 1b Up Normal 29.86 MB 50.00% 85070591730234615865843651857942052864
192.168.2.7 us-west 1c Up Normal 60.63 MB 0.00% 85070591730234615865843651857942052865

While with the EC2Snitch wwe see:
Address DC Rack Status State Load Owns Token
                                                                               85070591730234615865843651857942052865
107.20.68.176 us-east 1b Up Normal 59.95 MB 50.00% 0
204.236.179.193 us-west 1c Up Normal 53.67 MB 0.00% 1
184.73.133.171 us-east 1b Up Normal 60.65 MB 50.00% 85070591730234615865843651857942052864
204.236.166.4 us-west 1c Up Normal 26.33 MB 0.00% 85070591730234615865843651857942052865

What also strange is that the Load on the nodes changes as well. For example, node 204.236.166.4 sometimes is very low (~26KB), other times its closer to 30MB. We see the same kind of variability in both clusters.

For both clusters, we're running stress tests with the following options:

--consistency-level=LOCAL_QUORUM --threads=4 --replication-strategy=NetworkTopologyStrategy --strategy-properties=us-east:2,us-west:2 --column-size=128 --keep-going --num-keys=100000 -r

Any clues to what is going on here are greatly appreciated.

Thanks
CM

On Sat, Sep 17, 2011 at 12:15 PM, Ikeda Anthony <anthony.ikeda.dev@gmail.com> wrote:
What snitch do you have configured? We typically see a proper spread of data across all our nodes equally.

Anthony


On 17/09/2011, at 10:06 AM, Chris Marino wrote:

Hi, I have a question about what to expect when running a cluster across datacenters with Local Quorum consistency.

My simplistic assumption is that the performance of an 8 node cluster split across 2 data centers and running with local quorum would perform roughly the same as a 4 node cluster in one data center.

I'm 95% certain we've set up the keyspace so that the entire range is in one datacenter and the client is local. I see the keyspace split across all the local nodes, with remote nodes owning 0%. Yet when I run the stress tests against this configuration with local quorum, I see dramatically different results from when I ran the same tests against a 4 node cluster.  I'm still 5% unsure of this because the documentation on how to configure this is pretty thin.

My understanding of Local Quorum was that once the data was written to a local quorum, the commit would complete. I also believed that this would eliminate any WAN latency required for replication to the other DC.

It not just that the split cluster runs slower, its also that there is enormous variability in identical tests. Sometimes by a factor of 2 or more. It seems as though the WAN latency is not only impacting performance, but that it's also introducing a wide variation on overally performance.

Should WAN latency be completely hidden with local quorum? Or are there second order issues involved that will impact performance??

I'm running in EC2 across us-east/west regions. I already know how unpredictable EC2 performance can be, but what I'm seeing with here is far beyond normal.performance variability for EC2

Is there something obvious that I'm missing that would explain why the results are so different?? 

Here's the config when we run a 2x2 cluster:

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               85070591730234615865843651857942052865      
192.168.2.1     us-east     1b          Up     Normal  25.26 MB        50.00%  0                                           
192.168.2.6     us-west     1c          Up     Normal  12.68 MB        0.00%   1                                           
192.168.2.2     us-east     1b          Up     Normal  12.56 MB        50.00%  85070591730234615865843651857942052864      
192.168.2.7     us-west     1c          Up     Normal  25.48 MB        0.00%   85070591730234615865843651857942052865      

Thanks in advance.
CM