We've recently been testing some of the higher performance instance classes on EC2, specifically the hi1.4xlarge, with Cassandra. For those that are not familiar with them, they have two SSD disks and 10 gige.
While we have observed much improved raw performance over our current instances, we are seeing a fairly large gap between Cassandra and raw performance. We have particularly noticed a gap in the streaming performance when bootstrapping a new node. I wanted to ensure that we have configured these instances correctly to get the best performance out of Cassandra.
When bootstrapping a new node into a small ring with a 35GB streaming payload, we see a 5-8 MB/sec max streaming rate joining the new node to the ring. We are using 1.2.6 with 256 token vnode support. In our tests the ring is small enough so all streaming occurs from a single node.
To test hardware performance for this use case, we ran an rsync of the sstables from one node to the next (to/from the same file systems) and observed a consistent rate of 115 MB/sec.
The only changes we've made to the config (aside from dirs/hosts) are:
+concurrent_reads: 128 # 32
+concurrent_writes: 128 # 32
+rpc_server_type: hsha # sync
+compaction_throughput_mb_per_sec: 256 # 16
+read_request_timeout_in_ms: 6000 # 10000
+endpoint_snitch: Ec2Snitch # SimpleSnitch
We use a 10G heap with a 2G new size. We are using the Oracle 1.7.0_25 JVM.
I've adjusted our streaming throughput limit from 200MB/sec up to 800MB/sec on both the sending and receiving streaming nodes, but that doesn't appear to make a difference.
The disks are raid0 (2 * 1T SSD) with 512 read ahead, XFS.
The nodes in the ring are running about 23% CPU on average, with spikes up to a maximum of 45% CPU.
As I mentioned, on the same boxes with the same workloads, I've seen up to 115 MB/sec transfers with rsync.
Any suggestions for what to adjust to see better streaming performance? 5% of what a single rsync can do seems somewhat limited.