incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Greene <green...@gmail.com>
Subject Re: At what point does the cluster get faster than the individual nodes?
Date Wed, 21 Apr 2010 16:50:18 GMT
Right it's a similar concept to DB sharding where you spread the write load
around to different DB servers but won't necessarily increase the throughput
of an one DB server but rather collectively.

On Wed, Apr 21, 2010 at 12:16 PM, Mike Gallamore <
mike.e.gallamore@googlemail.com> wrote:

>  Some people might be able to answer this better than me. However: with
> quorum consistency you have to communicate with n/2 + 1 where n is the
> replication factor nodes. So unless you are disk bound your real expense is
> going to be all those extra network latencies. I'd expect that you'll see a
> relatively flat throughput per thread once you reach the point that you
> aren't disk or CPU bound. That said the extra nodes mean if you should be
> able to handle more threads/connections at the same throughput on each
> thread/connection. So bigger cluster doesn't mean a single job goes faster
> necessarily, just that you can handle more jobs at the same time.
>
> On 04/21/2010 08:28 AM, Mark Jones wrote:
>
>  I’m seeing a cluster of 4 (replication factor=2) to be about as slow
> overall as the barely faster than the slowest node in the group.  When I run
> the 4 nodes individually, I see:
>
>
>
> For inserts:
>
> Two nodes @ 12000/second
>
> 1 node @ 9000/second
>
> 1 node @ 7000/second
>
>
>
> For reads:
>
> Abysmal, less than 1000/second (not range slices, individual lookups)  Disk
> util @ 88+%
>
>
>
>
>
> How many nodes are required before you see a net positive gain on inserts
> and reads (QUORUM consistency on both)?
>
> When I use my 2 fastest nodes as a pair, the thruput is around 9000
> inserts/second.
>
>
>
> What is a good to excellent hardware config for Cassandra?  I have separate
> drives for data and commit log and 8GB in 3 machines (all dual core).  My
> fastest insert node has 4GB and a triple core processor.
>
>
>
> I’ve run py_stress, and my C++ code beats it by several 1000 inserts/second
> toward the end of the runs, so I don’t think it is my app, and I’ve removed
> the super columns per some suggestions yesterday.
>
>
>
> When Cassandra is working, it performs well, the problem is that is
> frequently slows down to < 50% of its peaks and occasionally slows down to 0
> inserts/second which greatly reduces aggregate thruput.
>
>
>

Mime
View raw message