For capacity planning it's not worth worrying about whether the MemTables are empty, they will all end up full.

Internal caches may refer to either the Row and Key caches or the BloomFilters, not sure which in this case. 

Aaron


On 21 Oct, 2010,at 09:42 AM, CassUser CassUser <cassuser@gmail.com> wrote:

I didn't notice the number of hot CFs mentioned below.  So with data in them. We are sharing a cluster with others, so I'm trying to get an idea of what overhead there is for empty CFs if any.  What are internal caches?

On Wed, Oct 20, 2010 at 1:17 PM, CassUser CassUser <cassuser@gmail.com> wrote:
Cool thanks, that helps.

So even if we have defined a column family in the storage-conf and it's empty, this has some overhead in cassandra and the following rule should apply:

memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches.




On Wed, Oct 20, 2010 at 12:53 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
Take a look at the section on JVM Heap size here http://wiki.apache.org/cassandra/MemtableThresholds

CF's have a large overhead, Keyspaces have none/little. 

In general write performance will be affected by the memtable thresholds (also on the link above). Read performance will be affected by the size  of the cassandra caches and OS file caches. Compaction can slow a node, 0.7 handles this better via the dynamic snitch.

Start with conservative / default values, then crank things up. 

Aaron

 
On 21 Oct, 2010,at 08:42 AM, CassUser CassUser <cassuser@gmail.com> wrote:

Thanks for the link. 

#2 was not meant to be trick question, it just came out like that :).  what i was after is the overhead associated with large number of keyspaces and column families (i didn't mean empty memtables :).  If a few keyspaces that have 20 or so column families with a percentage of rows cached.  Does this effect write performance to other keyspaces in the cluster? 



On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser <cassuser@gmail.com> wrote:
> Hey,
>
> As I understand it writes go directly to the commit log.  Once a threshold
> has been reached the data is shipped to a memtable, and again to an sstable.
>
> 1. How many memtables are created when a flush happens from a commit log?
> One per CF?
>
> 2. Is there any space associated with an empty memtable?
>
> 3. When a flush happens from a memtable to an sstable, does this create a
> single new sstable?
>
> 4. Should compaction be turned off during a large data load?
>
> Thanks.
>

Take a look at:


http://wiki.apache.org/cassandra/MemtableSSTable

1 and 3
Memtables flush for three reasons size, time, and number of
operations. There is one memtable per column family. Each memtable
flushes individually.

2. Is this a trick question?

4. Should compaction be turned off during a large data load?
You can disable compaction during bulk loads. This can help because
otherwise the same data might be compacted multiple times. However if
you go to long with compaction turned off you end up with multiple
sstables. This can end up in fragmented rows.