cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Limit on amount of CFs
Date Sun, 13 Feb 2011 11:04:02 GMT
> But when modeling the application I understand so far that ColumnFamily is
> sort of "table with objects". In typical application there are lot of tables
> so why is the mindset set towards having more or less 10 ColumnFamilies?
> Even in this trivial example there are already 7 CFs
> http://www.rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/.
> So what is best practice to create applications using Cassandra? Divide
> application to more parts and create Keyspace for each one of them?

Keyspaces don't really help. You can have 100 column families if you
want, but if you're worried about overhead then whatever overhead you
do get will tend to be indirect in nature. Smaller memtables, more
files on disk, etc.

If you have a legitimate use case for N column families, then that's
the way to do it. It is just that the tendency is towards fewer CF:s;
for data access together the idea is often to put it into a single CF
under the same row key, rather than doing the RDBMS style. Instead of
a fully normalized system with foreign keys, you tend to group data
together and keep them in fewer column families - often with the same
row key across multiple cf:s instead of foreign keys.

I guess to re-phrase: In Cassandra the idea is that you model your
data after the expected read and write behavior rather than optimizing
for normalization. This tends to mean fewer CF:s rather than lots and
lots of CF:s, but it does depend on use-case. Sometimes it can mean
more CF:s instead of feer (if you split data into different CF:s to
separate out often read data from seldom read data, or if you
de-normalize to provide multiple materialized views of the same data).

So I think the right approach is to look at what the correct data
model would be. If that somehow results in an extreme amounts of CF:s,
then re-evaluate based on the specific use-case. Maybe that is truly
what you need, maybe not. But in any case, the primary concern should
not be any potential hard limit but rather the performance
implications of how data is stored. If after reaching a conclusion the
number of CF:s is high enough that there is a concern that you may hit
some kind of artificial limit or unintended side-effect, one can look
at the situation then.

Suppose there is a hard limit. Suppose there is a piece of code
somewhere that says "you can only have up to 100 column families".
Even if that were the case, it would be pretty useless to respond to
the OP's question with that information. If the use case *truly* calls
for 100+ CF:s, then you have to look at the actual effects of that.
What was the hard limt, and why is it there? Is it just a matter of
removing the hard limit which had no real purpose? The answer to
whether or not the limit is truly "hard" in any given situation, is
probably going to be at least as dependent on that situation as it is
on the code that enforces the limit...

(Obviously though it doesn't hurt to know of actual hard artificial
limits, and as I said I'm not aware of any.)

-- 
/ Peter Schuller

Mime
View raw message