cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Evans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-51) Memory footprint for memtable
Date Fri, 10 Apr 2009 20:25:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697966#action_12697966
] 

Eric Evans commented on CASSANDRA-51:
-------------------------------------

In my test environment I've exhausted the JVM's heap and sent cassandra into hours long thrashing
which typically culminates in an out of memory exception and a premature end to the test.
Hence my own motivation for finding an optimization to EBM's memory utilization.

However, whether such an optimzation is made or not, there's still bound to be some non-Column
overhead which accumulates as the number of columns increases. The smaller the stored values
are, the more space this overhead is going to consume on the heap, despite the fact that it
isn't reported by currentSize_.  The trick to preventing an out of memory crash would seem
to be the careful tuning of MemtableSizeInMB and MemtableObjectCountInMillions to both the
allocated heap size and the type of data being stored. 

Currently these values aren't well advertised (I didn't know about them until I started digging
around in Memtable), so I propose the attached patch which includes them in the sample configuration,
and supplies a more conservative value for MemtableSizeInMB.


>  Memory footprint for memtable
> ------------------------------
>
>                 Key: CASSANDRA-51
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-51
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Eric Evans
>             Fix For: 0.3
>
>
> The implementation of EfficientBidiMap(EBM) today stores the column in two place, a map
and a sorted set. Both data structures store exactly the same values.
> I assume we're storing this twice so that the map can give us O(1) reads while the sortedset
is important for efficient flush. Is this tradeoff important ? Do we want to store the data
twice to get O(1) reads over O(log(n)) reads from sortedset? Is the sortedset implementation
broken? Perhaps we should consider a configuration option that turns off the map -- write
performance will be slightly improved, read performance will be somewhat worse, and the memory
footprint will probably be about half. Certainly sounds like a good alternative tradeoff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message