cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Greene <michael.gre...@gmail.com>
Subject Re: using cassandra as a real time DW
Date Fri, 06 Nov 2009 22:01:44 GMT
On Fri, Nov 6, 2009 at 3:46 PM, Joe Stump <joe@joestump.net> wrote:
> Nothing in Cassandra attempts to ensure that your data are equally spread
> over the different nodes (yet; there are several bugs open to this effect).
>
> That's not true from my understanding. It won't put three copies on the same
> node. The key word, I suppose, is "equally".
Right.  Mark isn't referring to the ReplicationFactor or the
distribution of an individual piece of data.  He's referring to the
potential for a series of 100 million rows to all go to the same
ReplicationFactor count nodes, even if you have a much larger cluster.
 If you use the RandomPartitioner and the various pieces of bootstrap
functionality in 0.5 or good token picking, this solves the problem.
If you use the OPP Cassandra is only part of the way there on trunk.

> I think you're misleading people, though, with the notion that a. Cassandra
> doesn't have load balancing (it does, in many ways) and b. It doesn't scale.
If you are able to tune your data/application and Cassandra to each
other, it can scale and balance well; I've been very happy with it
here.  I don't think it is currently usable as a generic data
warehouse though (in addition to the above, the DIY tooling is a huge
drawback for someone looking for a generalized DW).

Michael

Michael

Mime
View raw message