From user-return-31649-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Feb 5 08:39:18 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE7D1E3A0 for ; Tue, 5 Feb 2013 08:39:18 +0000 (UTC) Received: (qmail 54120 invoked by uid 500); 5 Feb 2013 08:39:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53616 invoked by uid 500); 5 Feb 2013 08:39:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53570 invoked by uid 99); 5 Feb 2013 08:39:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 08:39:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a83.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 08:39:07 +0000 Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id 62FAF5E063 for ; Tue, 5 Feb 2013 00:38:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=xiwbKhlsjP76DK1f1qG0y1+td4 A=; b=nhmaCOVJtN9c6mz2wR5GooxzOxeTwW1hlVEaeMJ+TJDKs+nwQUWrOa6nGx AcPtAl9M8xKvCyYriOINF39rUFI/SzeK93toIVDVAjmsanpqzkz710rojzBnrF17 cpmY8m+lm9Sz00EfGmqyBTDUgxwUIH0T3wetCw1Z07KquEgxM= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id A8E9B5E060 for ; Tue, 5 Feb 2013 00:38:44 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_2E20E32A-7CA4-4451-A119-2B2419B164FC" Message-Id: <46354466-86CD-4B14-82B3-4070791E9714@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Pycassa vs YCSB results. Date: Tue, 5 Feb 2013 21:38:43 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_2E20E32A-7CA4-4451-A119-2B2419B164FC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 The first thing I noticed is your script uses python threading library, = which is hampered by the Global Interpreter Lock = http://docs.python.org/2/library/threading.html You don't really have multiple threads running in parallel, try using = the multiprocessor library.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha = wrote: > Hi, >=20 > Could some one please let me know any hints, why the pycassa = client(attached) is much slower than the YCSB? > is it something to attribute to performance difference between python = and Java? or the pycassa api has some performance limitations? >=20 > I don't see any client statements affecting the pycassa performance. = Please have a look at the simple python script attached and let me know > your suggestions. >=20 > thanks > pradeep >=20 > On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha = wrote: >=20 >=20 > On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha = wrote: > Thanks.. Please find the script as attachment. >=20 > Just re-iterating. > Its just a simple python script which submit 4 threads.=20 > This script has been scheduled on 8 cores using taskset unix command , = thus running 32 threads/node.=20 > and then scaling to 16 nodes >=20 > thanks > pradeep >=20 >=20 > On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs = wrote: > Can you provide the python script that you're using? >=20 > (I'm moving this thread to the pycassa mailing list = (pycassa-discuss@googlegroups.com), which is a better place for this = discussion.) >=20 >=20 > On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha = wrote: > Hi, >=20 > I am trying to benchmark cassandra on a 12 Data Node cluster using 16 = clients ( each client uses 32 threads) using custom pycassa client and = YCSB. >=20 > I found the maximum number of operations/seconds achieved using = pycassa client is nearly 70k+ reads/second. > Whereas with YCSB it is ~ 120k reads/second. >=20 > Any thoughts, why I see this huge difference in performance? >=20 >=20 > Here is the description of setup. >=20 > Pycassa client (a simple python script). > 1. Each pycassa client starts 4 threads - where each thread queries = 76896 queries. > 2. a shell script is used to submit 4threads/each core using taskset = unix command on a 8 core single node. ( 8 * 4 * 76896 queries) > 3. Another shell script is used to scale the single node shell script = to 16 nodes ( total queries now - 16 * 8 * 4 * 76896 queries ) >=20 > I tried to keep YCSB configuration as much as similar to my custom = pycassa benchmarking setup. >=20 > YCSB - >=20 > Launched 16 YCSB clients on 16 nodes where each client uses 32 threads = for execution and need to query ( 32 * 76896 keys ), i.e 100% reads >=20 > The dataset is different in each case, but has >=20 > 1. same number of total records. > 2. same number of fields. > 3. field length is almost same. >=20 > Could you please let me know, why I see this huge performance = difference and is there any way I can improve the operations/second = using pycassa client. >=20 > thanks > pradeep > =20 >=20 >=20 >=20 > --=20 > Tyler Hobbs > DataStax >=20 >=20 >=20 > --Apple-Mail=_2E20E32A-7CA4-4451-A119-2B2419B164FC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 The = first thing I noticed is your script uses python threading library, = which is hampered by the Global Interpreter Lock http://docs.pytho= n.org/2/library/threading.html

You don't really = have multiple threads running in parallel, try using the multiprocessor = library. 

Cheers

http://www.thelastpickle.com

On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeepm66@gmail.com> = wrote:

Hi,

Could some one please let me know = any hints, why the pycassa client(attached) is much slower than the = YCSB?
is it something to attribute to performance difference = between python and Java? or the pycassa api has some performance = limitations?

I don't see any client statements affecting the = pycassa performance. Please have a look at the simple python script = attached and let me know
your = suggestions.

thanks
pradeep

On Thu, Jan 31, = 2013 at 4:53 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com> wrote:


On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar = Mantha <pradeepm66@gmail.com> wrote:
Thanks.. Please find the script as attachment.

Just = re-iterating.
Its just a simple python script which submit 4 = threads. 
This script has been scheduled on 8 cores using = taskset unix command , thus running 32 threads/node. 
and then scaling to 16 = nodes

thanks
pradeep


On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <tyler@datastax.com> wrote:
Can = you provide the python script that you're using?

(I'm moving this = thread to the pycassa mailing list (pycassa-discuss@googlegroups.com), which is a = better place for this discussion.)


On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar = Mantha <pradeepm66@gmail.com> wrote:
Hi,

I am trying to benchmark = cassandra on a 12 Data Node cluster using 16 clients ( each client uses = 32 threads) using custom pycassa client and YCSB.

I found the maximum number of operations/seconds = achieved using pycassa client is nearly 70k+ reads/second.
Whereas with YCSB it is ~ 120k = reads/second.

Any thoughts, why I see this huge = difference in performance?


Here = is the description of setup.

Pycassa client (a simple python script).
1. Each = pycassa client starts 4 threads - where each thread queries 76896 = queries.
2. a shell script is used to submit 4threads/each = core using taskset unix command on a 8 core single node. ( 8 * 4 * 76896 = queries)
3. Another shell script is used to scale the single node shell = script to 16 nodes  ( total queries now - 16 * 8 * 4 * 76896 = queries )

I tried to keep YCSB configuration as = much as similar to my custom pycassa benchmarking setup.

YCSB -

Launched 16 YCSB = clients on 16 nodes where each client uses 32 threads for execution and = need to query ( 32 * 76896 keys ), i.e 100% = reads

The dataset is different in each case, = but has

1. same number of total records.
2. same = number of fields.
3. field length is almost = same.

Could you please let me know, why I see = this huge performance difference and is there any way I can improve the = operations/second using pycassa client.

thanks
pradeep
 



--
Tyler Hobbs
DataStax
=



= <pycassa_client.py>

= = --Apple-Mail=_2E20E32A-7CA4-4451-A119-2B2419B164FC--