cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-51) Memory footprint for memtable
Date Wed, 08 Apr 2009 17:34:13 GMT


Jonathan Ellis commented on CASSANDRA-51:

It looks to me like the only place we actually use the map side of things is doing a diff
during read repair.  This is relatively uncommon so spending a little cpu doing binary search
in the uncommon case definitely seems worth the gains in memory efficiency.  We might even
come out ahead on cpu too since we wouldn't have to keep the map in sync as well.

>  Memory footprint for memtable
> ------------------------------
>                 Key: CASSANDRA-51
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: all
>            Reporter: Sandeep Tata
> The implementation of EfficientBidiMap(EBM) today stores the column in two place, a map
and a sorted set. Both data structures store exactly the same values.
> I assume we're storing this twice so that the map can give us O(1) reads while the sortedset
is important for efficient flush. Is this tradeoff important ? Do we want to store the data
twice to get O(1) reads over O(log(n)) reads from sortedset? Is the sortedset implementation
broken? Perhaps we should consider a configuration option that turns off the map -- write
performance will be slightly improved, read performance will be somewhat worse, and the memory
footprint will probably be about half. Certainly sounds like a good alternative tradeoff.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message