cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
Date Wed, 01 Jan 2014 18:48:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859917#comment-13859917
] 

Benedict commented on CASSANDRA-6271:
-------------------------------------

bq. If I start iterating from one "version" of the tree, and some updates are made to the
end of the tree before I finish, I'll see those in my iterator, no?

Not unless I really messed up. The new tree is constructed locally (i.e. completely contained
within the method call), and exposed only when complete. All trees once constructed are completely
immutable, so any iterator formed against an old tree will remain valid and consistent, and
only prevent the unused parts of the old tree from being reclaimed by GC.

> Replace SnapTree in AtomicSortedColumns
> ---------------------------------------
>
>                 Key: CASSANDRA-6271
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>         Attachments: oprate.svg
>
>
> On the write path a huge percentage of time is spent in GC (>50% in my tests, if accounting
for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure
and also very expensive to keep around - each column name in AtomicSortedColumns uses >
100 bytes on average (excluding the actual ByteBuffer).
> I suggest using a sorted array; changes are supplied at-once, as opposed to one at a
time, and if < 10% of the keys in the array change (and data equal to < 10% of the size
of the key array) we simply overlay a new array of changes only over the top. Otherwise we
rewrite the array. This method should ensure much less GC overhead, and also save approximately
80% of the current memory overhead.
> TreeMap is similarly difficult object for the GC, and a related task might be to remove
it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns,
for instance, seems to be used in a lot of places where we could simply sort the columns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message