Are you using Virtual Machines to run Cassandra? Ive found that performance in VMs is crap

Nicolas Santini


On Thu, Feb 3, 2011 at 11:17 PM, aaron morton <aaron@thelastpickle.com> wrote:
This page has a guide to setting the initial tokens for the nodes 
http://wiki.apache.org/cassandra/Operations#Ring_management

You can also use the bin/nodetool cfstats command or JConsole to check the maximum row size in each node, to see if you have a monster row.

Aaron

On 3/02/2011, at 10:22 PM, abhinav prakash rai wrote:

Hi Peter,

Thanks for your reply.

Our application is multi-threaded. we are using 8 core machine. In our application we are using 4 column families out of which one column family is containing rows whose size is huge relative to size of the rows in other column families.

In the ring the balance is highly skewed.Can you suggest we can insure even balancing of the load across the cluster?

The rows id in one column family is combination of cell numbers ( ie 9883240354_9885430354 ) and other row id's are like thread_name_12234 etc.

How to insure spreading the data across rows?

Thanks & Regards,
abhinav



 

On Thu, Feb 3, 2011 at 1:46 PM, Peter Schuller <peter.schuller@infidyne.com> wrote:
> First time I tun single instance of Cassandra and my application on a system
> (16GB ram and 8 core), the time taken was 480sec.
> When I added one more system ,(means this time I was running 2 instance
> of Cassandra in cluster) and running application from single client , I
> found time taken in increased to 1000sec.   And I also found that that data
> distribution was also very odd on both system (in one system data were about
> 2.5GB and another were 140MB).
> Is any configuration require while running Cassandra in a cluster other than
> adding seeds ?

For starters:

(1) Are you spreading your data around evenly across row? Rows
determine where data is placed in the cluster.
(2) Is your ring actually balanced? (nodetool ring, they should have 50/50)
(3) Is your test concurrent/multi-threaded? Increasing total time
would be expected if you're moving from local traffic only to running
against remote machines,  if your test is a sequential workload.
Adding machines increases aggregate throughput across multiple
clients; it won't make individual requests faster (except indirectly
of course by avoiding overloaded conditions).


--
/ Peter Schuller



--
Regards,
Abhinav P. Rai