cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
Date Fri, 27 Dec 2013 22:57:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857819#comment-13857819
] 

Benedict commented on CASSANDRA-6271:
-------------------------------------

bq. Suspicious that Stack.compareTo ignores forward unless stacks are identical up to min
depth.
This is because if they're identical to a given depth, but a different depth, then one of
them is in a child and another in a parent branch. This is due to a possibly poor decision
(see the TODO in Stack.find()) wherein the iterator position in a branch is overloaded to
cover both 'being in a child', and 'visiting the key that bounds the child', and if you're
iterating forward/backwards this overloading is slightly different (see the Stack.compareTo
comment). 
As a result the meaning of being in the same position at the min depth is different depending
on iteration order. Ideally this wouldn't be the case, but it didn't seem worth rewriting
it to resolve.

bq. Confused about ML.copyFrom – we reset to EMPTY trees, then add elements from other sources.
Is there a better set of names we can use?
We only reset to EMPTY if we're inserting a new node either above or below those that already
exist in the btree we're modifying. Otherwise we're 'copying' any data (or unaffected child
trees) from the tree we're modifying. I agree the nomenclature isn't superb, but I couldn't
think of anything that was definitely clearer. Possibly 'originalNode'.

bq. Unclear when ML.update returns null.
ML.update() returns null when it's successfully applied the update. i.e. a non-null return
is always the ML necessary to apply update() to to make progress on the update(), and by repeatedly
calling update on the result you will eventually succeed. The maybe confusing condition at
the bottom of ML.update() that returns null is to handle the case where we update(POSITIVE_INFINITY)
to complete a batch of updates, which by definition can never find a place to insert in the
tree so effectively results in copying any remaining data from the tree we're modifying and
ascending to the root of the new tree.

bq. What is more efficient about ML.build vs calling ML.update(EMPTY_LEAF)?
Just a lot fewer unnecessary method calls copying empty ranges. Algorithmically they're equivalent.
I haven't benchmarked them, actually, so it's possible it's not dramatically different. My
original design of update() required build, as it couldn't deal with inserting a collection
larger than the tree, but that's no longer a limitation.

bq. Is there a minimum number of keys in a branch or leaf? Are branches/leaves ever combined?
Put another way: are all leaf nodes at the same tree depth?
Normal btree rules apply, so minimum keys for both branches and leaves (except root) is FAN_FACTOR
/ 2, and all leaf nodes are at the same depth (see the isWellFormed child type check ensuring
never a mix of leaf/branch children)


> Replace SnapTree in AtomicSortedColumns
> ---------------------------------------
>
>                 Key: CASSANDRA-6271
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>         Attachments: oprate.svg
>
>
> On the write path a huge percentage of time is spent in GC (>50% in my tests, if accounting
for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure
and also very expensive to keep around - each column name in AtomicSortedColumns uses >
100 bytes on average (excluding the actual ByteBuffer).
> I suggest using a sorted array; changes are supplied at-once, as opposed to one at a
time, and if < 10% of the keys in the array change (and data equal to < 10% of the size
of the key array) we simply overlay a new array of changes only over the top. Otherwise we
rewrite the array. This method should ensure much less GC overhead, and also save approximately
80% of the current memory overhead.
> TreeMap is similarly difficult object for the GC, and a related task might be to remove
it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns,
for instance, seems to be used in a lot of places where we could simply sort the columns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message