incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <jsh...@gmail.com>
Subject Re: data model and queries.
Date Sun, 23 May 2010 23:21:49 GMT
Every system has its limits. When you say to imagine there are
billions of users without providing any other real data, it limits the
discussion strictly to the hypothetical (and hyperbolic, usually).

The only reasonable answer we could provide would be about the types
of limitations we know about and how they manifest.

Here are the ones I know of off the top of my head, but you'll need to
provide more specific constraints to get a better answer from anybody.
* you must be able to fit a unit of work/transfer in memory, don't
assume streaming support
* you may not scale subcolumns within a supercolumn
* compaction requires more than 2N storage
* very large or growing datasets require active monitoring for storage headroom
I'm sure there are others that I've forgotten.

If you are going to be storing a virtually unlimited (billions of...)
amount of information, how do you intend to scale your storage?
What are your performance requirements? What is your synchronous
consistency requirement? What is your asynchronous consistency
requirement? What's the nature of the workload? Is it batching loads,
or many fine units of work all the time?

That said, these types of questions should not be unusual for any
large system. I think the gist of your answer is "probably, but there
will be growing pains, as with any other system." One of the benefits
of Cassandra is the ability to make design trade-offs which have a
direct impact on scalability and consistency, which leaves you with
more options when you hit a speed bump. Another is that when there are
speed bumps which are considered a significant problem for more than a
few people, they get some attention. (Thanks, devs).

On Sun, May 23, 2010 at 5:04 AM, Kartal Guner <kguner@hakia.com> wrote:
> I am trying to find out if Cassandra will fill my needs.
>
>
>
> I have a data model similar to below.
>
>
>
> Users = {
>
> //ColumnFamily
>
>
>
>                 user1 =
> {
> //Key for Users ColumnFamily
>
>
>
>                                 message1 = {
>                                                    
//Supercolumn
>
>                                                
text: hello
>                                         //Column
>
>                                                
type: html
>                               //Column
>
>                                                
rating:
> 88                                            
//Column
>
>                                 }
>
>                                 ...
>
>                                 messageN
>
>                 }
>
>                 ...
>
>                 CountryN
>
> }
>
>
>
> Imagine there can be billions of users and hundreds of thousands of messages
> per user.
>
>
>
> After a message entry it will not be updated.
>
> I want to do queries such as:
>
> * Get all messages for user1 with type = HTML
>
> * Get top 100 message for user1, order by rating.
>
>
>
>
>
> 1) Is this possible with cassandra?
>
> 2) Do I have the right datamodel? Can it be optimized?

Mime
View raw message