incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: hector or astyanax
Date Mon, 06 May 2013 17:24:45 GMT
You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket when
doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the nic
buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1
buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any rate,
my belief is it probably is still basically parallel performance on one socket though I have
not tested my theory…..My theory being the real bottleneck on performance being the work
cassandra has to do on the reads and such.

What about 20 sockets then(like someone has a pool).  Will it be any faster…not really sure
as in the end you are still held up by the real bottleneck of reading from disk on the cassandra
side.  We went to 20 threads in one case using 20 sockets with astyanax and received no performance
improvement(synchronous but more sockets did not improve our performance).  Ie. It may be
the case 90% of the time, one socket is just as fast as 10/20…..I would love to know the
truth/answer to that though.

Later,
Dean


From: Aaron Turner <synfinatic@gmail.com<mailto:synfinatic@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the server process them out of order doesn't make
it fully "parellel".  You're still using a single TCP connection which is by definition a
serial data stream.  Basically, if you send a bunch of queries which each return a large amount
of data you've effectively limited your query throughput to a single TCP connection.  Using
Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance
in certain situations, but generally speaking I would refrain from making general claims without
actual benchmarks to back them up.   I do completely agree that Async interfaces have their
place and have certain advantages over multi-threading models, but it's just another tool
to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL
not serial as they have request ids and the responses do as well so you can send a request
while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to
hold up so many threads in your webserver while you have outstanding requests being processed.
 The thrift async was not exactly async like I am suspecting the new java driver is, but have
not verified(I hope it is)

Dean

From: Aaron Turner <synfinatic@gmail.com<mailto:synfinatic@gmail.com><mailto:synfinatic@gmail.com<mailto:synfinatic@gmail.com>>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <derek@fyrie.net<mailto:derek@fyrie.net><mailto:derek@fyrie.net<mailto:derek@fyrie.net>>>
wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which
can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the
performance of thrift vs binary protocol, which I assume the binary protocol would be faster
since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is
faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Mime
View raw message