# cassandra-commits mailing list archives

##### Site index · List index
Message view
Top
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8915) Improve MergeIterator performance
Date Wed, 03 Jun 2015 15:00:39 GMT
```
[ https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569392#comment-14569392
]

Branimir Lambov edited comment on CASSANDRA-8915 at 6/3/15 3:00 PM:
--------------------------------------------------------------------

I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full ep-heaps
are ep-heaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full ep-heap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.

The critical step here is adding a new value {{C > M}} to combine two existing well-formed
ep-heaps with roots P and R leading to the same M into a bigger single well-formed ep-heap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).

We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top by setting {{ρ(P),
ρ(R) := C}}, {{ρ(C) := M}} and {{σ(C) := false}} and leaving {{σ(P), σ(R) := false}}.
The ep-heap rooted at C is well-formed leading to M.
# {{C = P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top, setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}}; {{σ(P) := true}}, {{σ(R) := (P = R)}} and {{σ(C) := false}}. The
heap rooted at C is well-formed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C
= R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P), σ(P)}} can be true or false): Place P at the top: set {{ρ(R) :=
P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use
the algorithm recursively to merge the two ep-subheaps leading to P at its children and the
new value {{C > P}} into an ep-heap leading to P. Let its root be Q. The necessary conditions
for an element of an ep-heap hold for Q. They are also true for R as {{P ≤ R}} and either
{{σ(P)}} in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}}
and {{σ(R)}} is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P)
= M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the constructed ep-heap rooted at

The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.

For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the sub-tree formed
by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
-- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is reached.

\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}

Updated [the branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.

was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full ep-heaps
are ep-heaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full ep-heap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.

The critical step here is adding a new value {{C > M}} to combine two existing well-formed
ep-heaps with roots P and R leading to the same M into a bigger single well-formed ep-heap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).

We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that {{σ(R)}} cannot be
true as that contradicts with {{C > M}}). {{σ(C) := false}}. The ep-heap rooted at C is
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P), ρ(R) := C}},
{{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) := false}}. The heap rooted
at C is well-formed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C = R}} if and
only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) := P}} and
if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use the algorithm
recursively to merge the two ep-subheaps leading to P at its children and the new value {{C
> P}} into an ep-heap leading to P. Let its root be Q. The necessary conditions for an
element of an ep-heap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}}
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}}
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) = M}} as well
as {{P = M}} if and only if {{σ(P)}}, thus the constructed ep-heap rooted at P leads to M.

The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.

For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the sub-tree formed
by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
-- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is reached.

\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}

Updated [the branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.

> Improve MergeIterator performance
> ---------------------------------
>
>                 Key: CASSANDRA-8915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: compaction, performance
>             Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a pair of {{poll}}+{{add}}
operations for every item in the resulting sequence. This is quite inefficient as {{poll}}
necessarily applies at least {{log N}} comparisons (up to {{2log N}}), and {{add}} often requires
another {{log N}}, for example in the case where the inputs largely don't overlap (where {{N}}
is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform replacement
of the top of the queue in a single step, which will very often complete after a couple of
comparisons and in the worst case scenarios will match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with limited overlap
(e.g. levelled compaction).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

```
Mime
View raw message