cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: horizontal query scaling issues follow on
Date Thu, 17 Jul 2014 19:01:45 GMT
How many partitions are you spreading those 18 million rows over? That many rows in a single
partition will not be a sweet spot for Cassandra. It’s not exceeding any hard limit (2 billion),
but some internal operations may cache the partition rather than the logical row.

And all those rows in a single partition would certainly not be a test of “horizontal scaling”
(adding nodes to handle more data – more token values or partitions.)

-- Jack Krupansky

From: Diane Griffith 
Sent: Thursday, July 17, 2014 1:33 PM
To: user 
Subject: horizontal query scaling issues follow on

This is a follow on re-post to clarify what we are trying to do, providing information that
was missing or not clear.

Goal:  Verify horizontal scaling for random non duplicating key reads using the simplest configuration
(or minimal configuration) possible.


A couple years ago we did similar performance testing with Cassandra for both read and write
performance and found excellent (essentially linear) horizontal scalability.  That project
got put on hold.  We are now moving forward with an operational system and are having scaling

During the prior testing (3 years ago) we were using a much older version of Cassandra (0.8
or older), the THRIFT API, and Amazon AWS rather than OpenStack VMs.  We are now using the
latest Cassandra and the CQL interface.  We did try moving from OpenStack to AWS/EC2 but that
did not materially change our (poor) results.

Test Procedure:

  a.. Inserted 54 million cells in 18 million rows (so 3 cells per row), using randomly generated
row keys. That was to be our data control for the test. 
  b.. Spawn a client on a different VM to query 100k rows and do that for 100 reps.  Each
row key queried is drawn randomly from the set of existing row keys, and then not re-used,
so all 10 million row queries use a different (valid) row key.  This test is a specific use
case of our system we are trying to show will scale 

  a.. 2 nodes performed better than 1 node test but 4 nodes showed decreased performance over
2 nodes.  So that did not show horizontal scaling 


  a.. We have replication factor set to 1 as we were trying to keep the control test simple
to prove out horizontal scaling.  
  b.. When we tried to add threading to see if it would help it had interesting side behavior
which did not prove out horizontal scaling. 
  c.. We are using CQL versus THRIFT API for Cassandra 2.0.6 

Does anyone have any feedback that either threading or replication factor is necessary to
show horizontal scaling of Cassandra versus the minimal way of just continue to add nodes
to help throughput?

Any suggestions of minimal configuration necessary to show scaling of our query use case 100k
requests for random non repeating keys constantly coming in over a period of time?



View raw message