cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9991) Implement efficient btree removal
Date Tue, 24 Nov 2015 13:40:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024528#comment-15024528
] 

Branimir Lambov commented on CASSANDRA-9991:
--------------------------------------------

Thank you for the patch. You are right to question the worth of saving some work at the expense
of code. I'd actually prefer to go the other way and simplify the code more.

It's not obvious why you want to have {{remove}} separated from {{removeFromLeft}} -- I can
see that it is necessary to avoid copying / reorganizing when the key is not in the tree,
but that could be stated in a comment to save some reader the time to figure it out. The same
goes for the need for {{swap}} to walk the whole tree again, a comment would help a lot.

I'd extract [the predecessor loop in {{remove}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R42]
to a method in {{BTree}}. Finding last element (using {{findByIndex(tree, size-1)}}) is also
done in a few other places, you can point them to that method as well.

The ordering in [{{removeFromLeaf}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R82]
is odd: We first look for the easy case, then switch to the most difficult one, than go back
to an easier one (also repeating a check). Why not check if left is large enough first, if
not check if right is large enough, if not do the join?

{{removeLeft / removeRight}} are very unreadable. Perhaps extract {{copyRemoving/copyInserting}}
methods and call them?

[The newNextNode size calculation|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R211]
in {{removeLeft/Right}} looks suspect; it only works because {{FAN_FACTOR}} is a power of
2. I think it would be much clearer if you enforced the invariant by using something like
{{(nextKeyEnd + 1) | 1}}.

[{{copyBranchWithKeyAndChildRemoved}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R279]
is only used with (and seems to only make sense with) {{keyIndex == childIndex}}, you can
remove the other parameter.

{{System.arraycopy}} can deal with length 0, remove the unnecessary checks [in {{copyKeys/Children}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R330].

The test coverage is great, but I'd like to have a more exhaustive random test as well --
something like removing everything in random order from a large {{BTree}}. I also noticed
{{isWellFormed}} does not check branch node size, could you add that too?

Going further, the {{BTree}} infrastructure includes methods that deal in element indices.
You can use {{findIndex}} instead of the code in {{remove}} and then do the removal walk using
indices, which will replace a few of the binary searches in {{removeFromLeaf}} with integer
ones. Since you are then certain that you need to remove an element, it would be easy to integrate
the predecessor swap into the removal pass.

I'm curious if you tried the recursive deletion with bottom-up reorganization alternative?
That looks like it would be simpler if we are okay with a few extra allocations and copying,
which we may be as it shouldn't happen too often. What do you think?

> Implement efficient btree removal
> ---------------------------------
>
>                 Key: CASSANDRA-9991
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9991
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>              Labels: patch
>             Fix For: 3.x
>
>         Attachments: trunk-9991.txt
>
>
> Currently removal is implemented as a reconstruction by filtering and iterator over the
original btree. This could be much more efficient, editing just the necessary nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message