cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CassUser CassUser <cassu...@gmail.com>
Subject Re: memtable sstable questions (0.6.4)
Date Wed, 20 Oct 2010 20:42:23 GMT
I didn't notice the number of hot CFs mentioned below.  So with data in
them. We are sharing a cluster with others, so I'm trying to get an idea of
what overhead there is for empty CFs if any.  What are internal caches?

On Wed, Oct 20, 2010 at 1:17 PM, CassUser CassUser <cassuser@gmail.com>wrote:

> Cool thanks, that helps.
>
> So even if we have defined a column family in the storage-conf and it's
> empty, this has some overhead in cassandra and the following rule should
> apply:
>
> memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches.
>
>
>
> On Wed, Oct 20, 2010 at 12:53 PM, Aaron Morton <aaron@thelastpickle.com>wrote:
>
>> Take a look at the section on JVM Heap size here
>> http://wiki.apache.org/cassandra/MemtableThresholds
>>
>> <http://wiki.apache.org/cassandra/MemtableThresholds>CF's have a large
>> overhead, Keyspaces have none/little.
>>
>> In general write performance will be affected by the memtable thresholds
>> (also on the link above). Read performance will be affected by the size  of
>> the cassandra caches and OS file caches. Compaction can slow a node, 0.7
>> handles this better via the dynamic snitch.
>>
>> Start with conservative / default values, then crank things up.
>>
>> Aaron
>>
>> On 21 Oct, 2010,at 08:42 AM, CassUser CassUser <cassuser@gmail.com>
>> wrote:
>>
>> Thanks for the link.
>>
>> #2 was not meant to be trick question, it just came out like that :).
>> what i was after is the overhead associated with large number of keyspaces
>> and column families (i didn't mean empty memtables :).  If a few keyspaces
>> that have 20 or so column families with a percentage of rows cached.  Does
>> this effect write performance to other keyspaces in the cluster?
>>
>>
>>
>> On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>>
>>>
>>> On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser <cassuser@gmail.com>
>>> wrote:
>>> > Hey,
>>> >
>>> > As I understand it writes go directly to the commit log.  Once a
>>> threshold
>>> > has been reached the data is shipped to a memtable, and again to an
>>> sstable.
>>> >
>>> > 1. How many memtables are created when a flush happens from a commit
>>> log?
>>> > One per CF?
>>> >
>>> > 2. Is there any space associated with an empty memtable?
>>> >
>>> > 3. When a flush happens from a memtable to an sstable, does this create
>>> a
>>> > single new sstable?
>>> >
>>> > 4. Should compaction be turned off during a large data load?
>>> >
>>> > Thanks.
>>> >
>>>
>>> Take a look at:
>>>
>>>
>>> http://wiki.apache.org/cassandra/MemtableSSTable
>>>
>>> 1 and 3
>>> Memtables flush for three reasons size, time, and number of
>>> operations. There is one memtable per column family. Each memtable
>>> flushes individually.
>>>
>>> 2. Is this a trick question?
>>>
>>> 4. Should compaction be turned off during a large data load?
>>> You can disable compaction during bulk loads. This can help because
>>> otherwise the same data might be compacted multiple times. However if
>>> you go to long with compaction turned off you end up with multiple
>>> sstables. This can end up in fragmented rows.
>>>
>>
>>
>

Mime
View raw message