incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Pycassa vs YCSB results.
Date Tue, 05 Feb 2013 08:38:43 GMT
The first thing I noticed is your script uses python threading library, which is hampered by
the Global Interpreter Lock http://docs.python.org/2/library/threading.html

You don't really have multiple threads running in parallel, try using the multiprocessor library.


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:

> Hi,
> 
> Could some one please let me know any hints, why the pycassa client(attached) is much
slower than the YCSB?
> is it something to attribute to performance difference between python and Java? or the
pycassa api has some performance limitations?
> 
> I don't see any client statements affecting the pycassa performance. Please have a look
at the simple python script attached and let me know
> your suggestions.
> 
> thanks
> pradeep
> 
> On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:
> 
> 
> On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:
> Thanks.. Please find the script as attachment.
> 
> Just re-iterating.
> Its just a simple python script which submit 4 threads. 
> This script has been scheduled on 8 cores using taskset unix command , thus running 32
threads/node. 
> and then scaling to 16 nodes
> 
> thanks
> pradeep
> 
> 
> On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <tyler@datastax.com> wrote:
> Can you provide the python script that you're using?
> 
> (I'm moving this thread to the pycassa mailing list (pycassa-discuss@googlegroups.com),
which is a better place for this discussion.)
> 
> 
> On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:
> Hi,
> 
> I am trying to benchmark cassandra on a 12 Data Node cluster using 16 clients ( each
client uses 32 threads) using custom pycassa client and YCSB.
> 
> I found the maximum number of operations/seconds achieved using pycassa client is nearly
70k+ reads/second.
> Whereas with YCSB it is ~ 120k reads/second.
> 
> Any thoughts, why I see this huge difference in performance?
> 
> 
> Here is the description of setup.
> 
> Pycassa client (a simple python script).
> 1. Each pycassa client starts 4 threads - where each thread queries 76896 queries.
> 2. a shell script is used to submit 4threads/each core using taskset unix command on
a 8 core single node. ( 8 * 4 * 76896 queries)
> 3. Another shell script is used to scale the single node shell script to 16 nodes  (
total queries now - 16 * 8 * 4 * 76896 queries )
> 
> I tried to keep YCSB configuration as much as similar to my custom pycassa benchmarking
setup.
> 
> YCSB -
> 
> Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for execution
and need to query ( 32 * 76896 keys ), i.e 100% reads
> 
> The dataset is different in each case, but has
> 
> 1. same number of total records.
> 2. same number of fields.
> 3. field length is almost same.
> 
> Could you please let me know, why I see this huge performance difference and is there
any way I can improve the operations/second using pycassa client.
> 
> thanks
> pradeep
>  
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax
> 
> 
> 
> <pycassa_client.py>


Mime
View raw message