incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From graham sanderson <gra...@vast.com>
Subject Re: Is per-table memory overhead due to SSTables or tables?
Date Sat, 09 Aug 2014 00:40:47 GMT
google ;-)

On Aug 8, 2014, at 7:33 PM, Kevin Burton <burton@spinn3r.com> wrote:

> hm.. as a side note, it's amazing how much cassandra information is locked up in JIRAs…
wonder if there's a way to compute automatically the JIRAs with important information.
> 
> 
> On Fri, Aug 8, 2014 at 5:14 PM, graham sanderson <graham@vast.com> wrote:
> See https://issues.apache.org/jira/browse/CASSANDRA-5935
> 
> 2.1 has a radically different implementation that side steps this (with off heap memtables),
but if you really want lots of tables now you can do so as a trade off against GC behavior.
> 
> The problem is not SSTables per se, but more potentially one memtable per CF (and with
slab allocator that can/does cost 1M); I am not familiar enough with the code to know when
you would have 1 memtable vs 0 memtable for a CF that isn’t currently actively used.
> 
> Note also https://issues.apache.org/jira/browse/CASSANDRA-6602 and friends; there is
definitely a need for efficient discarding of old data in event streams.
> 
> 
> On Aug 8, 2014, at 2:29 PM, Kevin Burton <burton@spinn3r.com> wrote:
> 
>> The "conventional wisdom" says that it's ideal to only use "in the low hundreds"
in the number of tables with cassandra as each table can use 1MB or so of heap.  So if you
have 1000 tables you'd have 1GB of heap used (which is no fun).
>> 
>> But is this an issue with the tables themselves or the SSTables?
>> 
>> I think the root of this is the SSTables as all the arena overhead will be for the
SSTables too and more SSTables means more overhead.
>> 
>> So by adding more tables, you end up with more SSTables which means more heap memory.
>> 
>> If I'm in correct then this means that Cassandra could benefit from table partitioning.
 Whereby you put all values in a specific region to a specific set of tables.
>> 
>> So if you were storing log data, you could store it in hourly, or daily partitions,
but view the table as one logical unit.
>> 
>> the benefit here is that you could easily just drop the oldest data.  So if you need
to clean up data, you wouldn't have to drop the whole table, just a days worth of the data.

>> 
>> And since that day is just one SSTable on disk, the drop would be easy.. no tombstones,
just delete the whole SSTable.
>> 
>> 
>> 
>> -- 
>> 
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
> 
> 
> 
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 


Mime
View raw message