incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <>
Subject Re: using cassandra as a real time DW
Date Fri, 06 Nov 2009 23:16:00 GMT
Yes the reporting tools are another issue. But I don't think there is  
anything that can do rt.
This may end up looking more like a trading desk than a standard dw.

Sent from my phone
Ian Holsman - 703 879-3128

On 07/11/2009, at 9:01 AM, Michael Greene <>  

> On Fri, Nov 6, 2009 at 3:46 PM, Joe Stump <> wrote:
>> Nothing in Cassandra attempts to ensure that your data are equally  
>> spread
>> over the different nodes (yet; there are several bugs open to this  
>> effect).
>> That's not true from my understanding. It won't put three copies on  
>> the same
>> node. The key word, I suppose, is "equally".
> Right.  Mark isn't referring to the ReplicationFactor or the
> distribution of an individual piece of data.  He's referring to the
> potential for a series of 100 million rows to all go to the same
> ReplicationFactor count nodes, even if you have a much larger cluster.
> If you use the RandomPartitioner and the various pieces of bootstrap
> functionality in 0.5 or good token picking, this solves the problem.
> If you use the OPP Cassandra is only part of the way there on trunk.
>> I think you're misleading people, though, with the notion that a.  
>> Cassandra
>> doesn't have load balancing (it does, in many ways) and b. It  
>> doesn't scale.
> If you are able to tune your data/application and Cassandra to each
> other, it can scale and balance well; I've been very happy with it
> here.  I don't think it is currently usable as a generic data
> warehouse though (in addition to the above, the DIY tooling is a huge
> drawback for someone looking for a generalized DW).
> Michael
> Michael

View raw message