We are trying to setup a Cassandra cluster and have low read latency requirements. Running some tests, we do not see the performance that we were hoping for. Wanted to check if anyone has thoughts on:
1. If these are expected latency times for the data/machine config, etc
2. If not, can do something to improve our read times?
We set up 4 boxes as a ring running Cassandra 1.1.5, and setup a keyspace with replication 3, and strategy_class SimpleStrategy. The column family being tested has 12 columns, 4 of which form a composite key.
We then wrote in 192,000 randomly generated test data rows into the column family. Most columns are either randomly generated UUIDs, or short strings. One of them however is a blob consisting of around 1K data (we later reduced the size of this blob data, but didn’t seem to change our read times much)
Running a query to like “select * from <table_name> where atag=<foo>”, where ‘atag’ is the first column of the composite key, from either JDBC or Hector (equivalent code), results in read times of 200-300ms from a remote host on the same network. The query returned around 800 results. Running the same query on a Cassandra host results in a read time of ~110-130 ms.
Using read consistency of ONE reduces the read latency by ~20ms, compared to using QUORUM.
Enabling row cache did not seem to change the performance much. Moreover, the row cache ‘size’ according to nodetool was very tiny. Here is a snapshot of the nodetool info after running few read tests:
Key Cache : size 2448 (bytes), capacity 104857584 (bytes), 231 hits, 266 requests, 1.000 recent hit rate, 14400 save period in seconds
Row Cache : size 96 (bytes), capacity 4194304000 (bytes), 9 hits, 13 requests, NaN recent hit rate, 0 save period in seconds
Intel(R) Xeon(R) CPU L5640
OS: Solaris 5.10
RAM: 32 GB
Hard disk: 1 TB disk magnetic (not SSD)