I am trying to benchmark cassandra on a 12 Data Node cluster using 16 clients ( each client uses 32 threads) using custom pycassa client and YCSB.

I found the maximum number of operations/seconds achieved using pycassa client is nearly 70k+ reads/second.
Whereas with YCSB it is ~ 120k reads/second.

Any thoughts, why I see this huge difference in performance?

Here is the description of setup.

Pycassa client (a simple python script).
1. Each pycassa client starts 4 threads - where each thread queries 76896 queries.
2. a shell script is used to submit 4threads/each core using taskset unix command on a 8 core single node. ( 8 * 4 * 76896 queries)
3. Another shell script is used to scale the single node shell script to 16 nodes  ( total queries now - 16 * 8 * 4 * 76896 queries )

I tried to keep YCSB configuration as much as similar to my custom pycassa benchmarking setup.


Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for execution and need to query ( 32 * 76896 keys ), i.e 100% reads

The dataset is different in each case, but has

1. same number of total records.
2. same number of fields.
3. field length is almost same.

Could you please let me know, why I see this huge performance difference and is there any way I can improve the operations/second using pycassa client.