Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F9D410B6F for ; Mon, 6 May 2013 17:59:05 +0000 (UTC) Received: (qmail 44289 invoked by uid 500); 6 May 2013 17:59:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44228 invoked by uid 500); 6 May 2013 17:59:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44219 invoked by uid 99); 6 May 2013 17:59:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 17:59:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of synfinatic@gmail.com designates 209.85.217.172 as permitted sender) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 17:58:56 +0000 Received: by mail-lb0-f172.google.com with SMTP id y6so3689729lbh.17 for ; Mon, 06 May 2013 10:58:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=F+HXkxfOcEEGWSSf3dLkWez9HFuV/7lxmXpaRVjlEnY=; b=yGcPPPZQRbfRrY3rRxr/RZS4jaJlcmq6lfLbszrNrLGidLrqM42VXsQvvzpRFbezWn PfyjcbzVDBwsnSlF6ZOsTzTLvzZkrVLctmT/qoZxsbilM1WTKs32/IpMcC6mmSPP1OAP Qt/InvNai35lmPX3mt+ustef6vJjkbUrlxS6eYMldoIlhA8S9dUL27MZtn1j/Zk6J2FC B3sMQXDKaix2G3INUiMBMg7ZfwPyLZ2S6FMP1oVK36GlDqjdiJepqOB5nsXNdQTkK4NL zlooIFpyef6n0eh7b/CJrA86w5/pNZogq+rhkEwlsR/M6wAupzyIJk4+Sxg0+iUh/BXf FrJA== X-Received: by 10.112.130.196 with SMTP id og4mr8411277lbb.52.1367863116033; Mon, 06 May 2013 10:58:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.200.70 with HTTP; Mon, 6 May 2013 10:58:15 -0700 (PDT) In-Reply-To: References: From: Aaron Turner Date: Mon, 6 May 2013 10:58:15 -0700 Message-ID: Subject: Re: hector or astyanax To: cassandra users Content-Type: multipart/alternative; boundary=047d7b3a84a421f53a04dc1073e6 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a84a421f53a04dc1073e6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable >From my experience, your NIC buffers generally aren't the problem (or at least it's easy to tune them to fix). It's TCP. Simply put, your raw NIC throughput > single TCP socket throughput on most modern hardware/OS combinations. This is especially true as latency increases between the two hosts. This is why Bittorrent or "download accellerators" are often faster then just downloading a large file via your browser or ftp client- they're running multiple TCP connections in parallel compared to only one. TCP is great for reliable, bi-directional, stream based communication. Not the best solution for high throughput though. UDP is much better for that, but then you loose all the features that TCP gives you and so then people end up re-inventing the wheel (poorly I might add). So yeah, I think the answer to the question of "which is faster" the answer is "it depends on your queries". On Mon, May 6, 2013 at 10:24 AM, Hiller, Dean wrote: > You have me thinking more. I wonder in practice if 3 sockets is any > faster than 1 socket when doing nio. If your buffer sizes were small, > maybe that would be the case. Usually the nic buffers are big so when th= e > selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for > one socket. In both cases, all 3 requests are there in the buffers. At > any rate, my belief is it probably is still basically parallel performanc= e > on one socket though I have not tested my theory=85..My theory being the = real > bottleneck on performance being the work cassandra has to do on the reads > and such. > > What about 20 sockets then(like someone has a pool). Will it be any > faster=85not really sure as in the end you are still held up by the real > bottleneck of reading from disk on the cassandra side. We went to 20 > threads in one case using 20 sockets with astyanax and received no > performance improvement(synchronous but more sockets did not improve our > performance). Ie. It may be the case 90% of the time, one socket is just > as fast as 10/20=85..I would love to know the truth/answer to that though= . > > Later, > Dean > > > From: Aaron Turner > > Reply-To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Date: Monday, May 6, 2013 10:57 AM > To: cassandra users user@cassandra.apache.org>> > Subject: Re: hector or astyanax > > Just because you can batch queries or have the server process them out of > order doesn't make it fully "parellel". You're still using a single TCP > connection which is by definition a serial data stream. Basically, if yo= u > send a bunch of queries which each return a large amount of data you've > effectively limited your query throughput to a single TCP connection. > Using Thrift, each query result is returned in it's own TCP stream in > *parallel*. > > Not saying the new API isn't great, doesn't have it's place or may have > better performance in certain situations, but generally speaking I would > refrain from making general claims without actual benchmarks to back them > up. I do completely agree that Async interfaces have their place and ha= ve > certain advantages over multi-threading models, but it's just another too= l > to be used when appropriate. > > Just my .02. :) > > > > On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean Dean.Hiller@nrel.gov>> wrote: > I was under the impression that it is multiple requests using a single > connectin PARALLEL not serial as they have request ids and the responses = do > as well so you can send a request while a previous request has no respons= e > just yet. > > I think you do get a big speed advantage from the asynchronous nature as > you do not need to hold up so many threads in your webserver while you ha= ve > outstanding requests being processed. The thrift async was not exactly > async like I am suspecting the new java driver is, but have not verified(= I > hope it is) > > Dean > > From: Aaron Turner >>> > Reply-To: "user@cassandra.apache.org >>" < > user@cassandra.apache.org user@cassandra.apache.org>> > Date: Sunday, May 5, 2013 5:27 PM > To: cassandra users user@cassandra.apache.org> user@cassandra.apache.org>>> > Subject: Re: hector or astyanax > > > > On Sun, May 5, 2013 at 1:09 PM, Derek Williams derek@fyrie.net>>> wrote: > The binary protocol is able to multiplex multiple requests using a single > connection, which can lead to much better performance (similar to HTTP vs > SPDY). This is without comparing the performance of thrift vs binary > protocol, which I assume the binary protocol would be faster since it is > specialized for cassandra requests. > > > Curious why you think multiplexing multiple requests over a single > connection (serial) is faster then multiple requests over multiple > connections (parallel)? > > And isn't Thrift a binary protocol? > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" > > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" > --=20 Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero" --047d7b3a84a421f53a04dc1073e6 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
From my experience, your NIC buffers generally aren't = the problem (or at least it's easy to tune them to fix). =A0It's TC= P. =A0Simply put, your raw NIC throughput > single TCP socket throughput= on most modern hardware/OS combinations. =A0This is especially true as lat= ency increases between the two hosts. =A0This is why Bittorrent or "do= wnload accellerators" are often faster then just downloading a large f= ile via your browser or ftp client- they're running multiple TCP connec= tions in parallel compared to only one.

TCP is great for reliable, bi-directional, stream base= d communication. =A0Not the best solution for high throughput though. =A0UD= P is much better for that, but then you loose all the features that TCP giv= es you and so then people end up re-inventing the wheel (poorly I might add= ).

So yeah, I think the answer to the question= of "which is faster" the answer is "it depends on your quer= ies".



On Mon, May 6, 2013 at 10:24 AM, Hiller, Dea= n <Dean.Hiller@nrel.gov> wrote:
You have me thinking more. =A0I wonder in practice if 3 sockets is any fast= er than 1 socket when doing nio. =A0If your buffer sizes were small, maybe = that would be the case. =A0Usually the nic buffers are big so when the sele= ctor fires it is reading from 3 buffers for 3 sockets or 1 buffer for one s= ocket. =A0In both cases, all 3 requests are there in the buffers. =A0At any= rate, my belief is it probably is still basically parallel performance on = one socket though I have not tested my theory=85..My theory being the real = bottleneck on performance being the work cassandra has to do on the reads a= nd such.

What about 20 sockets then(like someone has a pool). =A0Will it be any fast= er=85not really sure as in the end you are still held up by the real bottle= neck of reading from disk on the cassandra side. =A0We went to 20 threads i= n one case using 20 sockets with astyanax and received no performance impro= vement(synchronous but more sockets did not improve our performance). =A0Ie= . It may be the case 90% of the time, one socket is just as fast as 10/20= =85..I would love to know the truth/answer to that though.

Later,
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the serv= er process them out of order doesn't make it fully "parellel"= . =A0You're still using a single TCP connection which is by definition = a serial data stream. =A0Basically, if you send a bunch of queries which ea= ch return a large amount of data you've effectively limited your query = throughput to a single TCP connection. =A0Using Thrift, each query result i= s returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or = may have better performance in certain situations, but generally speaking I= would refrain from making general claims without actual benchmarks to back= them up. =A0 I do completely agree that Async interfaces have their place = and have certain advantages over multi-threading models, but it's just = another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrot= e:
I was under the impression that it is multiple requests using a single conn= ectin PARALLEL not serial as they have request ids and the responses do as = well so you can send a request while a previous request has no response jus= t yet.

I think you do get a big speed advantage from the asynchronous nature as yo= u do not need to hold up so many threads in your webserver while you have o= utstanding requests being processed. =A0The thrift async was not exactly as= ync like I am suspecting the new java driver is, but have not verified(I ho= pe it is)

Dean

From: Aaron Turner <synfin= atic@gmail.com<mailto:synfin= atic@gmail.com><mailto:sy= nfinatic@gmail.com<mailto:sy= nfinatic@gmail.com>>>
Reply-To: "user@cassandra= .apache.org<mailto:user= @cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.apache.org<mail= to:user@cassandra.apache.org><mailto:user@cassandr= a.apache.org<mailto:use= r@cassandra.apache.org>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <= user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>=
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 P= M, Derek Williams <derek@fyrie.net<mailto:derek@fyrie.net><m= ailto:derek@fyrie.net<mailto:derek@fyrie.net>>> wrote:
The binary protocol is able to multiplex multiple requests using a single c= onnection, which can lead to much better performance (similar to HTTP vs SP= DY). This is without comparing the performance of thrift vs binary protocol= , which I assume the binary protocol would be faster since it is specialize= d for cassandra requests.


Curious why you think multiplexing multiple requests over a single connecti= on (serial) is faster then multiple requests over multiple connections (par= allel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/ =A0= =A0 =A0 =A0 Twitter: @synfinatic
http://tcpreplay= .synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.
=A0 =A0 -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/ =A0= =A0 =A0 =A0 Twitter: @synfinatic
http://tcpreplay= .synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.
=A0 =A0 -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
= Aaron Turner
http://synfin.net/=A0 = =A0 =A0 =A0=A0 Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for U= nix & Windows
Those who would give up essential Liberty, to purchase a little temporary <= br>Safety, deserve neither Liberty nor Safety.=A0
=A0 =A0 -- Benjamin F= ranklin
"carpe diem quam minimum credula postero"
--047d7b3a84a421f53a04dc1073e6--