incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Re: Is per-table memory overhead due to SSTables or tables?
Date Sat, 09 Aug 2014 00:33:45 GMT
hm.. as a side note, it's amazing how much cassandra information is locked
up in JIRAs… wonder if there's a way to compute automatically the JIRAs
with important information.


On Fri, Aug 8, 2014 at 5:14 PM, graham sanderson <graham@vast.com> wrote:

> See https://issues.apache.org/jira/browse/CASSANDRA-5935
>
> 2.1 has a radically different implementation that side steps this (with
> off heap memtables), but if you really want lots of tables now you can do
> so as a trade off against GC behavior.
>
> The problem is not SSTables per se, but more potentially one memtable per
> CF (and with slab allocator that can/does cost 1M); I am not familiar
> enough with the code to know when you would have 1 memtable vs 0 memtable
> for a CF that isn’t currently actively used.
>
> Note also https://issues.apache.org/jira/browse/CASSANDRA-6602 and
> friends; there is definitely a need for efficient discarding of old data in
> event streams.
>
>
> On Aug 8, 2014, at 2:29 PM, Kevin Burton <burton@spinn3r.com> wrote:
>
> The "conventional wisdom" says that it's ideal to only use "in the low
> hundreds" in the number of tables with cassandra as each table can use 1MB
> or so of heap.  So if you have 1000 tables you'd have 1GB of heap used
> (which is no fun).
>
> But is this an issue with the tables themselves or the SSTables?
>
> I think the root of this is the SSTables as all the arena overhead will be
> for the SSTables too and more SSTables means more overhead.
>
> So by adding more tables, you end up with more SSTables which means more
> heap memory.
>
> If I'm in correct then this means that Cassandra could benefit from table
> partitioning.  Whereby you put all values in a specific region to a
> specific set of tables.
>
> So if you were storing log data, you could store it in hourly, or daily
> partitions, but view the table as one logical unit.
>
> the benefit here is that you could easily just drop the oldest data.  So
> if you need to clean up data, you wouldn't have to drop the whole table,
> just a days worth of the data.
>
> And since that day is just one SSTable on disk, the drop would be easy..
> no tombstones, just delete the whole SSTable.
>
>
>
> --
>
> Founder/CEO Spinn3r.com <http://spinn3r.com/>
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
>  … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com/>
>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Mime
View raw message