We are evaluating cassandra for one of our storage needs. I am running a benchmark test to gauge cassandra's performance using http://github.com/brianfrankcooper/YCSB/wiki
Setup for Cassandra is 5 node cluster, replication factor 3. CentOS55 on amazon ec2
Sample test data single field of size 8 KB
Running 3-4 clients with 100 Threads each. Clients are also running in the same network(availability zone) on ec2.
I have run tests ranging from 1 million to 12 million inserts. I am getting a throughput of around 5 MB/s on the network and on the disk.
1) Is there any tuning I can do to improve the performance. I am trying to figure out a way to max out the network and/or disk IO but for some reason it always stays steady.
2) Another thing I notice is that the load does not get evenly distributed. I tried setting the tokens using the formula suggested in Token Selection section of Operations wiki page. That actually led to a more unbalanced load distribution (Which the doc warned can happen if the key distribution is not even).
Any suggestions/pointers are welcome. Thanks.