[ https://issues.apache.org/jira/browse/CASSANDRA8915?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14569392#comment14569392
]
Branimir Lambov edited comment on CASSANDRA8915 at 6/3/15 3:00 PM:

I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of epheaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.
We define "epheap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full epheaps
are epheaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full epheap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.
The critical step here is adding a new value {{C > M}} to combine two existing wellformed
epheaps with roots P and R leading to the same M into a bigger single wellformed epheap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).
We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top by setting {{ρ(P),
ρ(R) := C}}, {{ρ(C) := M}} and {{σ(C) := false}} and leaving {{σ(P), σ(R) := false}}.
The epheap rooted at C is wellformed leading to M.
# {{C = P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top, setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}}; {{σ(P) := true}}, {{σ(R) := (P = R)}} and {{σ(C) := false}}. The
heap rooted at C is wellformed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C
= R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P), σ(P)}} can be true or false): Place P at the top: set {{ρ(R) :=
P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use
the algorithm recursively to merge the two epsubheaps leading to P at its children and the
new value {{C > P}} into an epheap leading to P. Let its root be Q. The necessary conditions
for an element of an epheap hold for Q. They are also true for R as {{P ≤ R}} and either
{{σ(P)}} in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}}
and {{σ(R)}} is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P)
= M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the constructed epheap rooted at
P leads to M.
The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.
For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
 {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 compare P and R.
 If {{R < P}}, perform the next steps with P and R swapped.
 compare C and P.
 if {{C < P}} apply pt. 1.
 if {{C = P}} apply pt. 2.
 if {{C > P}} apply pt. 3.
Once an epheap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the subtree formed
by the children which have σ set.
To advance the elements equal to the root, we can:
 Process the elements in the deepest level of the σ subtree in any order. For each of them:
 Advance the element. Let the new value be C.
 The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form epheaps leading to ┬.
 Use the algorithm to combine P, R and C into a new epheap leading to ┬.
 Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
 Continue with the next deepest level of the hierarchy until the top is reached.
\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}
Updated [the branchhttps://github.com/apache/cassandra/compare/trunk...blambov:8915mergeiterator]
to use this solution and cleaned the code a bit.
was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of epheaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.
We define "epheap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full epheaps
are epheaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full epheap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.
The critical step here is adding a new value {{C > M}} to combine two existing wellformed
epheaps with roots P and R leading to the same M into a bigger single wellformed epheap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).
We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that {{σ(R)}} cannot be
true as that contradicts with {{C > M}}). {{σ(C) := false}}. The epheap rooted at C is
wellformed leading to M.
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P), ρ(R) := C}},
{{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) := false}}. The heap rooted
at C is wellformed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C = R}} if and
only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) := P}} and
if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use the algorithm
recursively to merge the two epsubheaps leading to P at its children and the new value {{C
> P}} into an epheap leading to P. Let its root be Q. The necessary conditions for an
element of an epheap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}}
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}}
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) = M}} as well
as {{P = M}} if and only if {{σ(P)}}, thus the constructed epheap rooted at P leads to M.
The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.
For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
 {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 compare P and R.
 If {{R < P}}, perform the next steps with P and R swapped.
 compare C and P.
 if {{C < P}} apply pt. 1.
 if {{C = P}} apply pt. 2.
 if {{C > P}} apply pt. 3.
Once an epheap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the subtree formed
by the children which have σ set.
To advance the elements equal to the root, we can:
 Process the elements in the deepest level of the σ subtree in any order. For each of them:
 Advance the element. Let the new value be C.
 The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form epheaps leading to ┬.
 Use the algorithm to combine P, R and C into a new epheap leading to ┬.
 Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
 Continue with the next deepest level of the hierarchy until the top is reached.
\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}
Updated [the branchhttps://github.com/apache/cassandra/compare/trunk...blambov:8915mergeiterator]
to use this solution and cleaned the code a bit.
> Improve MergeIterator performance
> 
>
> Key: CASSANDRA8915
> URL: https://issues.apache.org/jira/browse/CASSANDRA8915
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: compaction, performance
> Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a pair of {{poll}}+{{add}}
operations for every item in the resulting sequence. This is quite inefficient as {{poll}}
necessarily applies at least {{log N}} comparisons (up to {{2log N}}), and {{add}} often requires
another {{log N}}, for example in the case where the inputs largely don't overlap (where {{N}}
is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform replacement
of the top of the queue in a single step, which will very often complete after a couple of
comparisons and in the worst case scenarios will match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with limited overlap
(e.g. levelled compaction).

This message was sent by Atlassian JIRA
(v6.3.4#6332)
