cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <>
Subject Re: Merging the rows of two column families(with similar attributes) into one ??
Date Sat, 05 Feb 2011 19:52:45 GMT
> if you have under control parameters like
> memtable_throughput & memtable_operations which are set per column
> family basis then you can directly control & adjust by splitting the
> memory space between two CFs in proportion to what you would do in
> single CF.
> Hence there should be no extra memory consumption for multiple CFs
> that have been split from single one??

Yes, I think you have the right idea here.  This *is* a small amount of
overhead for the extra memtable and keeping track of a second set of
indexes, bloom filters, sstables, etc.

Regarding the compactions, I think even if they are more the size of
> the SST files to be compacted is smaller as the data has been split
> into two.
> Then more compactions but smaller too!!


if some CF is written less often as compared to other CFs, then the
> memtable would consume space in the memory until it is flushed, this
> memory space could have been much better used by a CF that's heavily
> written and read. And if you try to make the thresholds for flush
> smaller then more compactions would be needed.

If you merge the two CFs together, then updates to the 'less freqent' rows
will still consume memory, only it will all be within one memtable.
(Memtables grow in size until they are flushed, they don't reserve some set
amount of memory.)  Furthermore, because your memtables will be filled up by
the 'more frequent' rows, the 'less frequent' rows will get fewer
updates/overwrites in memory, so they will tend to be spread across a
greater number of SSTables.

Tyler Hobbs
Software Engineer, DataStax <>
Maintainer of the pycassa <> Cassandra
Python client library

View raw message