Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4199D9BDC for ; Fri, 19 Dec 2014 14:19:40 +0000 (UTC) Received: (qmail 45122 invoked by uid 500); 19 Dec 2014 14:19:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45080 invoked by uid 500); 19 Dec 2014 14:19:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45070 invoked by uid 99); 19 Dec 2014 14:19:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2014 14:19:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rsvihla@datastax.com designates 209.85.213.47 as permitted sender) Received: from [209.85.213.47] (HELO mail-yh0-f47.google.com) (209.85.213.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2014 14:19:11 +0000 Received: by mail-yh0-f47.google.com with SMTP id f73so412011yha.6 for ; Fri, 19 Dec 2014 06:17:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datastax.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zLt+Rg25+57a1Erd2RxYM3uO8UC8CCoK8rxXnLByC3E=; b=fX3kAFk3UItSMVTlBgOOPp8S88qIPEphdVSHPoVe+uw+56WyjrCuyXPZJ/CU/veN51 4k+4ENXyHjvpukHoLeEgCad6M/VtgbVLJ9q+FmIUIXlLxJ9iu1XA+aVzbJfQgj3pRv20 /6PQ1K+G+PLBULGrRs5/ItWG1I8Cz9xWftkgU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=zLt+Rg25+57a1Erd2RxYM3uO8UC8CCoK8rxXnLByC3E=; b=Fdz6yXD/9XYblrm/qNOyxBdtAmxU6gEJ5gXvSdZQV7fz+OU5b80QJ2r6Dg4MPxo93q n79/YySQX4jZ93Vpyi2IY7nM7Qw85g3zSHRXk1Geo3G3phmhQEpe4PxY6sokSR3BEOc7 Ic5PQGSxkcEPIoTYHyNpvlVIL2xXKaFIgUFWJiooAVfQI4twvstECACmSVhXNvON4mDm G5+i+aJOqliG5EX6+DFgBqL12yP0BhxyGbuZml/osZ3u5P/ngA65u7PbwjJmRyMtqqKV YvzElGlT0OGjEm5HH1mMhdmHz3w4cJ1QNaM5KqMQSCyESREr7uswLccrQWcvTOMFZl7d 8B4w== X-Gm-Message-State: ALoCoQlVS4CnacXRSefngYxlhQFSBs3ilmp9boAeRf9PnTaXKNez++8StXTlK1qzVPR4d06xaSMQ MIME-Version: 1.0 X-Received: by 10.170.209.208 with SMTP id a199mr7303960ykf.120.1418998658733; Fri, 19 Dec 2014 06:17:38 -0800 (PST) Received: by 10.170.216.2 with HTTP; Fri, 19 Dec 2014 06:17:38 -0800 (PST) In-Reply-To: <6E57694AFB682044A958B059AC3BF704B70AC1@corpmail2k10-2.corp.netledger.com> References: <6E57694AFB682044A958B059AC3BF704B70AC1@corpmail2k10-2.corp.netledger.com> Date: Fri, 19 Dec 2014 08:17:38 -0600 Message-ID: Subject: Re: Drivers performance From: Ryan Svihla To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113983cafdb90f050a925eff X-Virus-Checked: Checked by ClamAV on apache.org --001a113983cafdb90f050a925eff Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Better question for the java driver mailing list, but I see a number of problems in your Datastax java driver code, and without knowing the way Astyanax handles caching of prepared statements I can tell you 1. You're re repreparing a statement on _every_ iteration, and these are not cached by the driver. This is not only expensive, it is slower than just using non prepared statements. This is a substantial slow down. Drivers are not necessarily implementing this the same way so the code i= s not apples to apples. Change your code to prepare _once_ and I bet your numbers improve drastically. 2. Your pooling options are CRAZY high, and I'm guessing your'e running out of resources on the datastax driver, again the code is different wit= h different tradeoffs from Astyanax , a connection in thrift is not remote= ly the same as a connection in the modern remote protocol. Just use the default pooling options and I bet your numbers improve greatly (if not there is something deeply off about your cluster and or app servers). 3. A lot of the speed up in the java driver is in the async support and how the native protocol handles async, since you're doing synchronous th= is is the best case for thrift performance, however that still does not explain your gap ( which in most synchronous cases is thrift is comparab= le at best, but usually not faster ). 4. I haven't been able to figure out which version of the Datastax driver your on from looking at the code, this can change performance drastically as there has been many improvements, especially for Cassandr= a 2.1 I suggest you reply to the java driver mailing list for more in depth discussion https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-us= er On Fri, Dec 19, 2014 at 7:26 AM, Svec, Michal wrote: > > Hello, > > I am in the middle of evaluating whether we should switch from Astyanax t= o > datastax driver and I did simple benchmark that load 10 000 times the sam= e > row by key and I was surprised with the slowness of datastax driver. I > uploaded it to github. > > https://github.com/michalsvec/astyanax-datastax-benchmark > > > > It was tested against Cassandra 1.2 and 2.1. Testing conditions were naiv= e > (localhost, single node, =E2=80=A6) but still the difference is huge. > > > > 10 000 iterations: > > =C2=B7 Astyanax:2734 ms > > =C2=B7 Astyanax prepared:1997 ms > > =C2=B7 Datastax:10230 ms > > > > Is it really so slow or do I miss something? > > > > Thank you for any advice. > > Michal > > > > > NOTICE: This email and any attachments may contain confidential and > proprietary information of NetSuite Inc. and is for the sole use of the > intended recipient for the stated purpose. Any improper use or distributi= on > is prohibited. If you are not the intended recipient, please notify the > sender; do not review, copy or distribute; and promptly delete or destroy > all transmitted information. Please note that all communications and > information transmitted through this email system may be monitored and > retained by NetSuite or its agents and that all incoming email is > automatically scanned by a third party spam and filtering service which m= ay > result in deletion of a legitimate e-mail before it is read by the intend= ed > recipient. > --=20 [image: datastax_logo.png] Ryan Svihla Solution Architect [image: twitter.png] [image: linkedin.png] DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world=E2=80=99s most innovative enterpri= ses. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. --001a113983cafdb90f050a925eff Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Better question for the java driver mailing list, but I se= e a number of problems in your Datastax java driver code, and without knowi= ng the way=C2=A0Astyanax=C2=A0handles caching of prepared=C2=A0statements I can tell you
    <= li>You're re repreparing a statement on _every_ iteration, and these ar= e not cached by the driver. This is not only expensive, it is slower than j= ust using non prepared statements. This is a substantial slow down. Drivers= are not necessarily implementing this the same way so the code is not appl= es to apples. Change your code to prepare _once_ and I bet your numbers imp= rove drastically.
  1. Your pooling options are CRAZY high, and I'm = guessing your'e running out of resources on the datastax driver, again = the code is different with different tradeoffs from=C2=A0Astyanax=C2=A0, a = connection in thrift is not=C2=A0remotely the same as a connection in the m= odern remote protocol. Just use the default pooling options and I bet your = numbers improve greatly (if not there is something deeply off about your cl= uster and or app servers).
  2. A lot of the speed up in the java driver= is in the async support and how the native protocol handles async, since y= ou're doing synchronous this is the best case for thrift performance, h= owever that still does not explain your gap ( which in most synchronous cas= es is thrift is comparable at best, but usually not faster ).
  3. I hav= en't been able to figure out which version of the Datastax driver your = on from looking at the code, this can change performance drastically as the= re has been many improvements, especially for Cassandra 2.1
I= suggest you reply to the java driver mailing list for more in depth discus= sion https://groups.google.com/a/lists.datastax.com/forum/#!= forum/java-driver-user

=
On Fri, Dec 19, 2014 at 7:26 AM, Svec, Michal <msvec@netsuite.com> wrote:

=20

Hello,

I am in the middle of evaluating whether we should s= witch from Astyanax to datastax driver and I did simple benchmark that load= 10 000 times the same row by key and I was surprised with the slowness of = datastax driver. I uploaded it to github.

https://github.com/michalsvec/astyanax-= datastax-benchmark

=C2=A0

It was tested against Cassandra 1.2 and 2.1. Testing= conditions were naive (localhost, single node, =E2=80=A6) but still the di= fference is huge.

=C2=A0

10 000 iterations:

=C2=B7=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 Astyanax:2734 ms

=C2=B7=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 Astyanax prepared:1997 ms

=C2=B7=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 Datastax:10230 ms

=C2=A0

Is it really so slow or do I miss something?<= u>

=C2=A0

Thank you for any advice.

Michal

=C2=A0



NOTICE: This email and any attachments may contain confidential and proprie= tary information of NetSuite Inc. and is for the sole use of the intended r= ecipient for the stated purpose. Any improper use or distribution is prohib= ited. If you are not the intended recipient, please notify the sender; do n= ot review, copy or distribute; and promptly delete or destroy all transmitt= ed information. Please note that all communications and information transmi= tted through this email system may be monitored and retained by NetSuite or= its agents and that all incoming email is automatically scanned by a third= party spam and filtering service which may result in deletion of a legitim= ate e-mail before it is read by the intended recipient.



--

3D"datastax_logo.png"

Ryan Svihla

Solution Architect


3D"twitter.png" 3D"linkedin.png"

DataStax is the fastest, most scalable distrib= uted database technology, delivering Apache Cassandra to the world=E2=80=99= s most innovative enterprises. Datastax is built to be agile, always-on, an= d predictably scalable to any size. With more than 500 customers in 45 coun= tries, DataStax is the databa= se technology and transactional backbone of choice for the worlds most inno= vative companies such as Netflix, Adobe, Intuit, and eBay.


--001a113983cafdb90f050a925eff--