cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-51) Memory footprint for memtable and versioning semantics
Date Fri, 03 Apr 2009 20:42:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-51:
------------------------------------

    Description: 
The implementation of EfficientBidiMap(EBM) today stores the column in two place, a map and
a sorted set. Both data structures store exactly the same values.

I assume we're storing this twice so that the map can give us O(1) reads while the sortedset
is important for efficient flush. Is this tradeoff important ? Do we want to store the data
twice to get O(1) reads over O(log(n)) reads from sortedset? Is the sortedset implementation
broken? Perhaps we should consider a configuration option that turns off the map -- write
performance will be slightly improved, read performance will be somewhat worse, and the memory
footprint will probably be about half. Certainly sounds like a good alternative tradeoff.


  was:
The implementation of EfficientBidiMap(EBM) today stores the column in two place, a map and
a sorted set. Both data structures store exactly the same values.

I assume we're storing this twice so that the map can give us O(1) reads while the sortedset
is important for efficient flush. Is this tradeoff important ? Do we want to store the data
twice to get O(1) reads over O(log(n)) reads from sortedset? Is the sortedset implementation
broken? Perhaps we should consider a configuration option that turns off the map -- write
performance will be slightly improved, read performance will be somewhat worse, and the memory
footprint will probably be about half. Certainly sounds like a good alternative tradeoff.

The other reason of course to store this twice would be if you wanted to store older versions
also in the sortedset, but we're not doing that today. In fact we don't have a way to let
the client see a column history at all. But even to an internal API, the column history after
a bunch of inserts is undefined:

insert(key, col=val1, ts1)
insert(key, col=val2, ts2)
insert(key, col=val3, ts3)


Column history for col now depends on whether the memtable got flushed between inserts or
it remained in memory. This is not desirable behavior.




Modified description to avoid confusing two subjects.

>  Memory footprint for memtable and versioning semantics 
> --------------------------------------------------------
>
>                 Key: CASSANDRA-51
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-51
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: all
>            Reporter: Sandeep Tata
>
> The implementation of EfficientBidiMap(EBM) today stores the column in two place, a map
and a sorted set. Both data structures store exactly the same values.
> I assume we're storing this twice so that the map can give us O(1) reads while the sortedset
is important for efficient flush. Is this tradeoff important ? Do we want to store the data
twice to get O(1) reads over O(log(n)) reads from sortedset? Is the sortedset implementation
broken? Perhaps we should consider a configuration option that turns off the map -- write
performance will be slightly improved, read performance will be somewhat worse, and the memory
footprint will probably be about half. Certainly sounds like a good alternative tradeoff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message