incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: hector or astyanax
Date Tue, 07 May 2013 08:37:37 GMT
> i want to know which cassandra client is better?
Go with Astynax or Native Binary, they are both under active development and support by a
vendor / large implementor. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/05/2013, at 7:03 AM, Derek Williams <derek@fyrie.net> wrote:

> Also have to keep in mind that it should be rare to only use a single socket since you
are usually making at least 1 connection per node in the cluster (or local datacenter). There
is also nothing enforcing that a single client cannot open more than 1 connection to a node.
In the end it should come down to which protocol implementation is faster.
> 
> 
> On Mon, May 6, 2013 at 11:58 AM, Aaron Turner <synfinatic@gmail.com> wrote:
> From my experience, your NIC buffers generally aren't the problem (or at least it's easy
to tune them to fix).  It's TCP.  Simply put, your raw NIC throughput > single TCP socket
throughput on most modern hardware/OS combinations.  This is especially true as latency increases
between the two hosts.  This is why Bittorrent or "download accellerators" are often faster
then just downloading a large file via your browser or ftp client- they're running multiple
TCP connections in parallel compared to only one.
> 
> TCP is great for reliable, bi-directional, stream based communication.  Not the best
solution for high throughput though.  UDP is much better for that, but then you loose all
the features that TCP gives you and so then people end up re-inventing the wheel (poorly I
might add).
> 
> So yeah, I think the answer to the question of "which is faster" the answer is "it depends
on your queries".
> 
> 
> 
> On Mon, May 6, 2013 at 10:24 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
> You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket
when doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the
nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets
or 1 buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any
rate, my belief is it probably is still basically parallel performance on one socket though
I have not tested my theory…..My theory being the real bottleneck on performance being the
work cassandra has to do on the reads and such.
> 
> What about 20 sockets then(like someone has a pool).  Will it be any faster…not really
sure as in the end you are still held up by the real bottleneck of reading from disk on the
cassandra side.  We went to 20 threads in one case using 20 sockets with astyanax and received
no performance improvement(synchronous but more sockets did not improve our performance).
 Ie. It may be the case 90% of the time, one socket is just as fast as 10/20…..I would love
to know the truth/answer to that though.
> 
> Later,
> Dean
> 
> 
> From: Aaron Turner <synfinatic@gmail.com<mailto:synfinatic@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, May 6, 2013 10:57 AM
> To: cassandra users <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: hector or astyanax
> 
> Just because you can batch queries or have the server process them out of order doesn't
make it fully "parellel".  You're still using a single TCP connection which is by definition
a serial data stream.  Basically, if you send a bunch of queries which each return a large
amount of data you've effectively limited your query throughput to a single TCP connection.
 Using Thrift, each query result is returned in it's own TCP stream in *parallel*.
> 
> Not saying the new API isn't great, doesn't have it's place or may have better performance
in certain situations, but generally speaking I would refrain from making general claims without
actual benchmarks to back them up.   I do completely agree that Async interfaces have their
place and have certain advantages over multi-threading models, but it's just another tool
to be used when appropriate.
> 
> Just my .02. :)
> 
> 
> 
> On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
> I was under the impression that it is multiple requests using a single connectin PARALLEL
not serial as they have request ids and the responses do as well so you can send a request
while a previous request has no response just yet.
> 
> I think you do get a big speed advantage from the asynchronous nature as you do not need
to hold up so many threads in your webserver while you have outstanding requests being processed.
 The thrift async was not exactly async like I am suspecting the new java driver is, but have
not verified(I hope it is)
> 
> Dean
> 
> From: Aaron Turner <synfinatic@gmail.com<mailto:synfinatic@gmail.com><mailto:synfinatic@gmail.com<mailto:synfinatic@gmail.com>>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> Date: Sunday, May 5, 2013 5:27 PM
> To: cassandra users <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> Subject: Re: hector or astyanax
> 
> 
> 
> On Sun, May 5, 2013 at 1:09 PM, Derek Williams <derek@fyrie.net<mailto:derek@fyrie.net><mailto:derek@fyrie.net<mailto:derek@fyrie.net>>>
wrote:
> The binary protocol is able to multiplex multiple requests using a single connection,
which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing
the performance of thrift vs binary protocol, which I assume the binary protocol would be
faster since it is specialized for cassandra requests.
> 
> 
> Curious why you think multiplexing multiple requests over a single connection (serial)
is faster then multiple requests over multiple connections (parallel)?
> 
> And isn't Thrift a binary protocol?
> 
> 
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
> 
> 
> 
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
> 
> 
> 
> -- 
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary 
> Safety, deserve neither Liberty nor Safety.  
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
> 
> 
> 
> -- 
> Derek Williams


Mime
View raw message