The simple thing to do would be use the multiprocessing package and eliminate all shared state.
On a multicore box python threads can run on different cores and battle over obtaining the
GIL.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 5/02/2013, at 11:34 PM, Tim Wintle <timwintle@gmail.com> wrote:
> On Tue, 2013-02-05 at 21:38 +1300, aaron morton wrote:
>> The first thing I noticed is your script uses python threading library, which is
hampered by the Global Interpreter Lock http://docs.python.org/2/library/threading.html
>>
>> You don't really have multiple threads running in parallel, try using the multiprocessor
library.
>
> Python _should_ release the GIL around IO-bound work, so this is a
> situation where the GIL shouldn't be an issue (It's actually a very good
> use for python's threads as there's no serialization overhead for
> message passing between processes as there would be in most
> multi-process examples)
>
>
> A constant factor 2 slowdown really doesn't seem that significant for
> two different implementations, and I would not worry about this unless
> you're talking about thousands of machines..
>
> If you are talking about enough machines that this is real $$$, then I
> do think the python code can be optimised a lot.
>
> I'm talking about language/VM specific optimisations - so I'm assuming
> cpython (the standard /usr/bin/python as in the shebang).
>
> I don't know how much of a difference this will make, but I'd be
> interested in hearing your results:
>
>
> I would start by trying rewriting this:
>
> def start_cassandra_client(Threadname):
> f=open(Threadname,"w")
> for key in lines:
> key=key.strip()
> st=time.time()
> f.write(str(cf.get(key))+"\n")
> et=time.time()
> f.write("Time taken for a single query is " +
> str(round(1000*(et-st),2))+" milli secs\n")
> f.close()
>
> As something like this:
>
> def start_cassandra_client(Threadname):
> # Avoid variable names outside this scope
> time_fn = time.time
> colfam = cf
> f=open(Threadname,"w")
> for key in lines:
> key=key.strip()
> st=time_fn()
> f.write(str(colfam.get(key))+"\n")
> et=time_fn()
> f.write("Time taken for a single query is " +
> str(round(1000*(et-st),2))+" milli secs\n")
> f.close()
>
>
> If you don't consider it cheating compared to the java version, I would
> also move the "key.strip()" call to the module initiation instead of
> doing it once per thread, as there's a lot of function dispatch overhead
> in python.
>
>
> I'd also closely compare the IO going on in both versions (the .write
> calls). For example this may be significantly faster:
>
> et=time_fn()
> f.write(str(colfam.get(key))+"\nTime taken for a single query is "
> + str(round(1000*(et-st),2))+" milli secs\n")
>
>
> .. I haven't read your java code and I don't know Java IO semantics well
> enough to compare the behaviour of both.
>
> Tim
>
>
>
>
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Could some one please let me know any hints, why the pycassa client(attached)
is much slower than the YCSB?
>>> is it something to attribute to performance difference between python and Java?
or the pycassa api has some performance limitations?
>>>
>>> I don't see any client statements affecting the pycassa performance. Please have
a look at the simple python script attached and let me know
>>> your suggestions.
>>>
>>> thanks
>>> pradeep
>>>
>>> On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com>
wrote:
>>>
>>>
>>> On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com>
wrote:
>>> Thanks.. Please find the script as attachment.
>>>
>>> Just re-iterating.
>>> Its just a simple python script which submit 4 threads.
>>> This script has been scheduled on 8 cores using taskset unix command , thus running
32 threads/node.
>>> and then scaling to 16 nodes
>>>
>>> thanks
>>> pradeep
>>>
>>>
>>> On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <tyler@datastax.com> wrote:
>>> Can you provide the python script that you're using?
>>>
>>> (I'm moving this thread to the pycassa mailing list (pycassa-discuss@googlegroups.com),
which is a better place for this discussion.)
>>>
>>>
>>> On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com>
wrote:
>>> Hi,
>>>
>>> I am trying to benchmark cassandra on a 12 Data Node cluster using 16 clients
( each client uses 32 threads) using custom pycassa client and YCSB.
>>>
>>> I found the maximum number of operations/seconds achieved using pycassa client
is nearly 70k+ reads/second.
>>> Whereas with YCSB it is ~ 120k reads/second.
>>>
>>> Any thoughts, why I see this huge difference in performance?
>>>
>>>
>>> Here is the description of setup.
>>>
>>> Pycassa client (a simple python script).
>>> 1. Each pycassa client starts 4 threads - where each thread queries 76896 queries.
>>> 2. a shell script is used to submit 4threads/each core using taskset unix command
on a 8 core single node. ( 8 * 4 * 76896 queries)
>>> 3. Another shell script is used to scale the single node shell script to 16 nodes
( total queries now - 16 * 8 * 4 * 76896 queries )
>>>
>>> I tried to keep YCSB configuration as much as similar to my custom pycassa benchmarking
setup.
>>>
>>> YCSB -
>>>
>>> Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for execution
and need to query ( 32 * 76896 keys ), i.e 100% reads
>>>
>>> The dataset is different in each case, but has
>>>
>>> 1. same number of total records.
>>> 2. same number of fields.
>>> 3. field length is almost same.
>>>
>>> Could you please let me know, why I see this huge performance difference and
is there any way I can improve the operations/second using pycassa client.
>>>
>>> thanks
>>> pradeep
>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax
>>>
>>>
>>>
>>> <pycassa_client.py>
>>
>
>
|