CF's have a large overhead, Keyspaces have none/little.
In general write performance will be affected by the memtable thresholds (also on the link above). Read performance will be affected by the size of the cassandra caches and OS file caches. Compaction can slow a node, 0.7 handles this better via the dynamic snitch.
Start with conservative / default values, then crank things up.
On 21 Oct, 2010,at 08:42 AM, CassUser CassUser <email@example.com> wrote:
Thanks for the link.
#2 was not meant to be trick question, it just came out like that :). what i was after is the overhead associated with large number of keyspaces and column families (i didn't mean empty memtables :). If a few keyspaces that have 20 or so column families with a percentage of rows cached. Does this effect write performance to other keyspaces in the cluster?
On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo <firstname.lastname@example.org>
Take a look at:
On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser <email@example.com
> As I understand it writes go directly to the commit log. Once a threshold
> has been reached the data is shipped to a memtable, and again to an sstable.
> 1. How many memtables are created when a flush happens from a commit log?
> One per CF?
> 2. Is there any space associated with an empty memtable?
> 3. When a flush happens from a memtable to an sstable, does this create a
> single new sstable?
> 4. Should compaction be turned off during a large data load?
1 and 3
Memtables flush for three reasons size, time, and number of
operations. There is one memtable per column family. Each memtable
2. Is this a trick question?
You can disable compaction during bulk loads. This can help because
4. Should compaction be turned off during a large data load?
otherwise the same data might be compacted multiple times. However if
you go to long with compaction turned off you end up with multiple
sstables. This can end up in fragmented rows.