# cassandra-commits mailing list archives

##### Site index · List index
Message view
Top
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8915) Improve MergeIterator performance
Date Wed, 03 Jun 2015 14:40:38 GMT
```
[ https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569392#comment-14569392
]

Branimir Lambov edited comment on CASSANDRA-8915 at 6/3/15 2:40 PM:
--------------------------------------------------------------------

I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full ep-heaps
are ep-heaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full ep-heap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.

The critical step here is adding a new value {{C > M}} to combine two existing well-formed
ep-heaps with roots P and R leading to the same M into a bigger single well-formed ep-heap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).

We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that {{σ(R)}} cannot be
true as that contradicts with {{C > M}}). {{σ(C) := false}}. The ep-heap rooted at C is
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P), ρ(R) := C}},
{{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) := false}}. The heap rooted
at C is well-formed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C = R}} if and
only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) := P}} and
if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use the algorithm
recursively to merge the two ep-subheaps leading to P at its children and the new value {{C
> P}} into an ep-heap leading to P. Let its root be Q. The necessary conditions for an
element of an ep-heap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}}
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}}
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) = M}} as well
as {{P = M}} if and only if {{σ(P)}}, thus the constructed ep-heap rooted at P leads to M.

The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.

For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the sub-tree formed
by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
-- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is reached.

\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}

Updated [the branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.

was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below ρ(N) stands for parent of N and σ(N) stands for the 'equalParent'
predicate, i.e. σ(N) ↔ ρ(N) = N.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N satisfies
ρ(N) ≤ N & σ(N) ↔ ρ(N) = N and the root has M as its parent. Full ep-heaps are
ep-heaps leading to top, where 'top' is assumed to be a special value strictly smaller than
any other (thus σ(R) is always false for the root of a full ep-heap). For simplicity we will
take all heaps to end at a special node called 'bottom' whose value is assumed to be greater
than any other value.

The critical step here is adding a new value C > M to combine two existing well-formed
ep-heaps P and R leading to the same M into a bigger single well-formed ep-heap leading to
M. Without loss of generality we can take P ≤ R (otherwise we can swap the places of P and
R in the discussion below).

We distinguish three cases:
# C < P (necessarily ¬σ(P)): We place C at the top by setting ρ(P), ρ(R) := C, ρ(C)
:= M and leaving σ(P), σ(R) := false (Note that σ(R) cannot be true as that contradicts
with C > M). σ(C) := false. The ep-heap rooted at C is well-formed leading to M.
# C = P (necessarily ¬σ(P)): We place C at the top, setting ρ(P), ρ(R) := C, ρ(C) :=
M, σ(P) := true and σ(R) := (P = R). σ(C) := false. The heap rooted at C is well-formed
as C = P & σ(P) as well as C = P ≤ R & C = R if and only if σ(R).
# C > P (σ(P) can be true or false): Set ρ(R) := P. If ¬σ(P), set σ(R) := (P = R),
otherwise leave σ(R) unchanged. Use the algorithm recursively to turn the two ep-subheaps
leading to P at its children and the new value C > P into an ep-heap leading to P. Let
its root be Q. The necessary conditions for an element of an ep-heap hold for Q. They are
also true for R as P ≤ R and either σ(P) in which case P = M and σ(R) if and only if R
= M = P, or ¬σ(P) and σ(R) is explicitly set to be true if and only if P = R. We also have
ρ(P) = M as well as P = M if and only if σ(P), thus the constructed ep-heap rooted at P

The recursion always ends for finite heaps as we must reach the first case due to the ordering
of bottom.

For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
- σ(P): apply pt. 3 for P and return (P must be ≤ R). No update of σ(P)/σ(R) necessary.
- σ(R): apply pt. 3 for R and return (R must now be < P). No update of σ(P)/σ(R) necessary.
- compare P and R.
- If R < P, perform the next steps with P and R swapped.
- compare C and P.
- if C < P apply pt. 1.
- if C = P apply pt. 2.
- if C > P apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each element E in the
heap E = R if and only if σ is true for E and the chain of parents of E leading to but not
including the root. To consume the equal elements we can thus walk the sub-tree formed by
the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have ¬σ(P), ¬σ(R). Overwrite ρ(P) and ρ(R)
with top*. P and R now form ep-heaps leading to top.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to top.
-- Let its root be Q. Because the heap leads to top we have ¬σ(Q).
- Continue with the next deepest level of the hierarchy until the top is reached.

\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}

Updated [the branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.

> Improve MergeIterator performance
> ---------------------------------
>
>                 Key: CASSANDRA-8915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: compaction, performance
>             Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a pair of {{poll}}+{{add}}
operations for every item in the resulting sequence. This is quite inefficient as {{poll}}
necessarily applies at least {{log N}} comparisons (up to {{2log N}}), and {{add}} often requires
another {{log N}}, for example in the case where the inputs largely don't overlap (where {{N}}
is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform replacement
of the top of the queue in a single step, which will very often complete after a couple of
comparisons and in the worst case scenarios will match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with limited overlap
(e.g. levelled compaction).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

```
Mime
View raw message