cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@datastax.com>
Subject Re: Combining all CFs into one big one
Date Sun, 01 May 2011 17:24:23 GMT
When you have a high number of CFs, it's a good idea to consider merging CFs
with highly correlated access patterns and similar structure into one. It is
*not* a good idea to merge all of your CFs into one (unless they all happen
to meet this criteria). Here's why:

Besides big compactions and long repairs that you can't break down into
smaller pieces, the main problem here is that your caching will become much
less efficient. The OS buffer cache will be less effective because rows from
all of the CFs will be interspersed in the SSTables. You will no longer be
able to tune the key or row cache to only cache frequently accessed data.
Both of these will tend to cause a serious increase in latency for your hot
data.

> Shouldn't these kinds of problems be solved by Cassandra?
>
They are mainly solved by Cassandra's general solution to any performance
problem: the addition of more nodes. There are tickets open to improve
compaction strategies, put bounds on SSTable sizes, etc; for example,
https://issues.apache.org/jira/browse/CASSANDRA-1608 , but the addition of
more nodes is a reliable solution to problems of this nature.

On Sun, May 1, 2011 at 7:28 AM, David Boxenhorn <david@taotown.com> wrote:

> Shouldn't these kinds of problems be solved by Cassandra? Isn't there a
> maximum SSTable size?
>
> On Sun, May 1, 2011 at 3:24 PM, shimi <shimi.k@gmail.com> wrote:
>
>> Big sstables, long compactions, in major compaction you will need to have
>> free disk space in the size of all the sstables (which you should have
>> anyway).
>>
>> Shimi
>>
>>
>> On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn <david@taotown.com>wrote:
>>
>>> I'm having problems administering my cluster because I have too many CFs
>>> (~40).
>>>
>>> I'm thinking of combining them all into one big CF. I would prefix the
>>> current CF name to the keys, repeat the CF name in a column, and index the
>>> column (so I can loop over all rows, which I have to do sometimes, for some
>>> CFs).
>>>
>>> Can anyone think of any disadvantages to this approach?
>>>
>>>
>>
>


-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Mime
View raw message