"Does the number of column families still significantly impact the memory footprint? If so, what is the incremental cost of a column family/table?"IMHO there would be little difference in memory use for a node with zero data that had 10 CF's and one that had 100 CF's. When you start putting data in the story changes.
As Alain said, the number of rows can impact the memory use. In 1.2+ that's less of an issue, but the index samples are still on heap. In my experience in normal (4Gb to 8GB heap) this is not an issue until you get into 500+ million rows.
The number of CF's is still used when calculating when to flush to disk. If you have 100 cf's the server will flush to disk more frequently than if you have 10. Because it needs to leave more room for the memtables to grow.
The best way to get help on this is provide details on the memory settings, the numbers of CF's, the total number of rows, and the cache settings.
Hope that helps.
if using 1.2.*, Bloom filters are in native memory so not pressuring your heap, how many data do you have per node ? If this value is big, you have samples index in the heap consuming a lot of memory, for sure, and growing as your data per node grow.
Solutions : increase the heap if < 8GB and / or reduce sampling index_interval: 128 to a bigger value (256 - 512) and /or wait for 2.0.* which, of the top of my head, should move the sampling in native memory allowing heap size to be independent from the data size per node.
This should alleviate things. Yet these are only guesses since I know almost nothing about your cluster...
Hope this help somehow.