incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <Janne.Jalka...@ecyrd.com>
Subject Re: Column Family per User
Date Wed, 18 Apr 2012 19:14:09 GMT

Each CF takes a fair chunk of memory regardless of how much data it has, so this is probably
not a good idea, if you have lots of users. Also using a single CF means that compression
is likely to work better (more redundant data).

However, Cassandra distributes the load across different nodes based on the row key, and the
writes scale roughly linearly according to the number of nodes. So if you can make sure that
no single row gets overly burdened by writes (50 million writes/day to a single row would
always go to the same nodes - this is in the order of 600 writes/second/node, which shouldn't
really pose a problem, IMHO). The main problem is that if a single row gets lots of columns
it'll start to slow down at some point, and your row caches become less useful, as they cache
the entire row.

Keep your rows suitably sized and you should be fine. To partition the data, you can either
distribute it to a few CFs based on use or use some other distribution method (like "user:1234:00"
where the "00" is the hour-of-the-day.

(There's a great article by Aaron Morton on how wide rows impact performance at http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/,
but as always, running your own tests to determine the optimal setup is recommended.)

/Janne

On Apr 18, 2012, at 21:20 , Trevor Francis wrote:

> Our application has users that can write in upwards of 50 million records per day. However,
they all write the same format of records (20 fields…columns). Should I put each user in
their own column family, even though the column family schema will be the same per user?
> 
> Would this help with dimensioning, if each user is querying their keyspace and only their
keyspace?
> 
> 
> Trevor Francis
> 
> 


Mime
View raw message