cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Kumar Mantha <>
Subject Pycassa vs YCSB results.
Date Fri, 01 Feb 2013 00:25:20 GMT

I am trying to benchmark cassandra on a 12 Data Node cluster using 16
clients ( each client uses 32 threads) using custom pycassa client and YCSB.

I found the maximum number of operations/seconds achieved using pycassa
client is nearly 70k+ reads/second.
Whereas with YCSB it is ~ 120k reads/second.

Any thoughts, why I see this huge difference in performance?

Here is the description of setup.

Pycassa client (a simple python script).
1. Each pycassa client starts 4 threads - where each thread queries 76896
2. a shell script is used to submit 4threads/each core using taskset unix
command on a 8 core single node. ( 8 * 4 * 76896 queries)
3. Another shell script is used to scale the single node shell script to 16
nodes  ( total queries now - 16 * 8 * 4 * 76896 queries )

I tried to keep YCSB configuration as much as similar to my custom pycassa
benchmarking setup.


Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for
execution and need to query ( 32 * 76896 keys ), i.e 100% reads

The dataset is different in each case, but has

1. same number of total records.
2. same number of fields.
3. field length is almost same.

Could you please let me know, why I see this huge performance difference
and is there any way I can improve the operations/second using pycassa


View raw message