cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stump <>
Subject Re: using cassandra as a real time DW
Date Fri, 06 Nov 2009 21:46:29 GMT
On Nov 6, 2009, at 2:35 PM, Mark Robson wrote:

> 2009/11/6 Joe Stump <>
> Can you explain what you mean by lack of load balancing?
> Nothing in Cassandra attempts to ensure that your data are equally  
> spread over the different nodes (yet; there are several bugs open to  
> this effect).

That's not true from my understanding. It won't put three copies on  
the same node. The key word, I suppose, is "equally".

> If you use the OrderedPartitioner, in all likelihood your data will  
> be very unevenly spread to the point where most of your servers  
> aren't used at all. This obviously doesn't scale.
> The RandomPartitioner is better because the hashing it does causes  
> data to spread out, but the tokens are still chosen randomly so  
> there's no way to guarantee that machines get equal or even similar 
> (ish) amounts of data.

We've answered this by creating our own partitioners, which Cassandra  
makes pluggable. Took one of our guys about two full days to have  
something up and running. Also, there's no way to guarantee anything  
for the most part in distributed computing.

I think you're misleading people, though, with the notion that a.  
Cassandra doesn't have load balancing (it does, in many ways) and b.  
It doesn't scale. Digg and Facebook both use it in production and  
while it might not be battle hardened and fully tested, it's  
definitely working for them well under high load.


View raw message