On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha <firstname.lastname@example.org> wrote:
Thanks.. Please find the script as attachment.Just re-iterating.Its just a simple python script which submit 4 threads.This script has been scheduled on 8 cores using taskset unix command , thus running 32 threads/node.and then scaling to 16 nodesthankspradeepOn Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <email@example.com> wrote:
Can you provide the python script that you're using?
(I'm moving this thread to the pycassa mailing list (firstname.lastname@example.org), which is a better place for this discussion.)
--On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha <email@example.com> wrote:
Hi,I am trying to benchmark cassandra on a 12 Data Node cluster using 16 clients ( each client uses 32 threads) using custom pycassa client and YCSB.I found the maximum number of operations/seconds achieved using pycassa client is nearly 70k+ reads/second.Whereas with YCSB it is ~ 120k reads/second.Any thoughts, why I see this huge difference in performance?Here is the description of setup.
Pycassa client (a simple python script).1. Each pycassa client starts 4 threads - where each thread queries 76896 queries.2. a shell script is used to submit 4threads/each core using taskset unix command on a 8 core single node. ( 8 * 4 * 76896 queries)3. Another shell script is used to scale the single node shell script to 16 nodes ( total queries now - 16 * 8 * 4 * 76896 queries )I tried to keep YCSB configuration as much as similar to my custom pycassa benchmarking setup.YCSB -Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for execution and need to query ( 32 * 76896 keys ), i.e 100% readsThe dataset is different in each case, but has1. same number of total records.2. same number of fields.3. field length is almost same.Could you please let me know, why I see this huge performance difference and is there any way I can improve the operations/second using pycassa client.thankspradeep