cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5506) Reduce memory consumption of IndexSummary
Date Sun, 28 Apr 2013 05:12:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643906#comment-13643906
] 

Vijay edited comment on CASSANDRA-5506 at 4/28/13 5:10 AM:
-----------------------------------------------------------

I have been thinking about moving IS off-heap for a while, I am really happy to see this ticket...
Just wanted to try and add value :)

Instead of storing the long[] and byte[][] in memory, can we store the indexes/pointers of
the decorated key in memory... which will be helpful to address the off-heap decorated key's
and offset?

For example: 
During the binary search, we can use offheap indexes.length to find the midpoint in memory
then reference it back to offheap BB which will be deserialized as needed (Summary effectively
becomes a contiguous off-heap location)?
                
      was (Author: vijay2win@yahoo.com):
    I have been thinking about moving IS off-heap for a while, I am really happy to see this
ticket... Just wanted to try to add value :)

Instead of storing the long[] and byte[][] in memory, can we store the indexes/pointers of
the decorated key in memory... which will be helpful to address the off-heap decorated key's
and offset?

For example: 
During the binary search, we can use offheap indexes.length to find the midpoint in memory
then reference it back to offheap BB which will be deserialized as needed (Summary effectively
becomes a contiguous off-heap location)?
                  
> Reduce memory consumption of IndexSummary
> -----------------------------------------
>
>                 Key: CASSANDRA-5506
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5506
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Nick Puz
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.5
>
>
> I am evaluating cassandra for a use case with many tiny rows which would result in a
node with 1-3TB of storage having billions of rows. Before loading that much data I am hitting
GC issues and when looking at the heap dump I noticed that 70+% of the memory was used by
IndexSummaries. 
> The two major issues seem to be:
> 1) that the positions are stored as an ArrayList<Long> which results in each position
taking 24 bytes (class + flags + 8 byte long). This might make sense when the file is initially
written but once it has been serialized it would be a lot more memory efficient to just have
an long[] (really a int[] would be fine unless 2GB sstables are allowed).
> 2) The DecoratedKey for a byte[16] key takes 195 bytes -- this is for the overhead of
the ByteBuffer in the key and overhead in the token.
> To somewhat "work around" the problem I have increased index_sample but will this many
rows that didn't really help starts to have diminishing returns. 
> NOTE: This heap dump was from linux with a 64bit oracle vm. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message