[ https://issues.apache.org/jira/browse/CASSANDRA8915?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14569392#comment14569392
]
Branimir Lambov edited comment on CASSANDRA8915 at 6/3/15 2:40 PM:

I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below a set of epheaps over some space of elements is defined by two functions,
{{ρ(N)}} which stands for "parent of N", and {{σ(N)}} which stands for "equals parent".
We will use 'element' and 'node' interchangeably as each element E also defines a node using
the ρ function with parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator =
as used below denotes element equality (not identity), distinct elements are allowed to be
equal.
We define "epheap leading to M" to be a binary tree of nodes where each node N satisfies
{{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full epheaps
are epheaps leading to ┬, where '┬' is assumed to be a special value strictly smaller
than any other (thus {{σ(R)}} is always false for the root of a full epheap). For simplicity
we will take all heaps to end at a special element called '⊥' whose value is assumed to
be greater than any other value.
The critical step here is adding a new value {{C > M}} to combine two existing wellformed
epheaps with roots P and R leading to the same M into a bigger single wellformed epheap
leading to M. Without loss of generality we can take {{P ≤ R}} (otherwise we can swap the
places of P and R in the discussion below).
We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P), ρ(R)
:= C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that {{σ(R)}} cannot be
true as that contradicts with {{C > M}}). {{σ(C) := false}}. The epheap rooted at C is
wellformed leading to M.
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P), ρ(R) := C}},
{{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) := false}}. The heap rooted
at C is wellformed as {{C = P & σ(P)}} as well as {{C = P ≤ R}} and {{C = R}} if and
only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) := P}} and
if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} unchanged; use the algorithm
recursively to merge the two epsubheaps leading to P at its children and the new value {{C
> P}} into an epheap leading to P. Let its root be Q. The necessary conditions for an
element of an epheap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}}
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}}
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) = M}} as well
as {{P = M}} if and only if {{σ(P)}}, thus the constructed epheap rooted at P leads to M.
The recursion always ends for finite heaps as we must reach the first case due to the ordering
of ⊥.
For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
 {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of {{σ(P)}}/{{σ(R)}}
necessary.
 compare P and R.
 If {{R < P}}, perform the next steps with P and R swapped.
 compare C and P.
 if {{C < P}} apply pt. 1.
 if {{C = P}} apply pt. 2.
 if {{C > P}} apply pt. 3.
Once an epheap with root R is formed, it trivially follows that for each element E in the
heap {{E = R}} if and only if σ is true for E and the chain of parents of E leading to but
not including the root. To consume the equal elements we can thus walk the subtree formed
by the children which have σ set.
To advance the elements equal to the root, we can:
 Process the elements in the deepest level of the σ subtree in any order. For each of them:
 Advance the element. Let the new value be C.
 The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite {{ρ(P)}}
and {{ρ(R)}} with ┬*. P and R now form epheaps leading to ┬.
 Use the algorithm to combine P, R and C into a new epheap leading to ┬.
 Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
 Continue with the next deepest level of the hierarchy until the top is reached.
\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}
Updated [the branchhttps://github.com/apache/cassandra/compare/trunk...blambov:8915mergeiterator]
to use this solution and cleaned the code a bit.
was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I wrote the following
proof to make sure I'm not missing something:
{panel}
In the discussion below ρ(N) stands for parent of N and σ(N) stands for the 'equalParent'
predicate, i.e. σ(N) ↔ ρ(N) = N.
We define "epheap leading to M" to be a binary tree of nodes where each node N satisfies
ρ(N) ≤ N & σ(N) ↔ ρ(N) = N and the root has M as its parent. Full epheaps are
epheaps leading to top, where 'top' is assumed to be a special value strictly smaller than
any other (thus σ(R) is always false for the root of a full epheap). For simplicity we will
take all heaps to end at a special node called 'bottom' whose value is assumed to be greater
than any other value.
The critical step here is adding a new value C > M to combine two existing wellformed
epheaps P and R leading to the same M into a bigger single wellformed epheap leading to
M. Without loss of generality we can take P ≤ R (otherwise we can swap the places of P and
R in the discussion below).
We distinguish three cases:
# C < P (necessarily ¬σ(P)): We place C at the top by setting ρ(P), ρ(R) := C, ρ(C)
:= M and leaving σ(P), σ(R) := false (Note that σ(R) cannot be true as that contradicts
with C > M). σ(C) := false. The epheap rooted at C is wellformed leading to M.
# C = P (necessarily ¬σ(P)): We place C at the top, setting ρ(P), ρ(R) := C, ρ(C) :=
M, σ(P) := true and σ(R) := (P = R). σ(C) := false. The heap rooted at C is wellformed
as C = P & σ(P) as well as C = P ≤ R & C = R if and only if σ(R).
# C > P (σ(P) can be true or false): Set ρ(R) := P. If ¬σ(P), set σ(R) := (P = R),
otherwise leave σ(R) unchanged. Use the algorithm recursively to turn the two epsubheaps
leading to P at its children and the new value C > P into an epheap leading to P. Let
its root be Q. The necessary conditions for an element of an epheap hold for Q. They are
also true for R as P ≤ R and either σ(P) in which case P = M and σ(R) if and only if R
= M = P, or ¬σ(P) and σ(R) is explicitly set to be true if and only if P = R. We also have
ρ(P) = M as well as P = M if and only if σ(P), thus the constructed epheap rooted at P
leads to M.
The recursion always ends for finite heaps as we must reach the first case due to the ordering
of bottom.
For performance the order of comparing elements and identifying the case to use can be changed
to sequentially checking:
 σ(P): apply pt. 3 for P and return (P must be ≤ R). No update of σ(P)/σ(R) necessary.
 σ(R): apply pt. 3 for R and return (R must now be < P). No update of σ(P)/σ(R) necessary.
 compare P and R.
 If R < P, perform the next steps with P and R swapped.
 compare C and P.
 if C < P apply pt. 1.
 if C = P apply pt. 2.
 if C > P apply pt. 3.
Once an epheap with root R is formed, it trivially follows that for each element E in the
heap E = R if and only if σ is true for E and the chain of parents of E leading to but not
including the root. To consume the equal elements we can thus walk the subtree formed by
the children which have σ set.
To advance the elements equal to the root, we can:
 Process the elements in the deepest level of the σ subtree in any order. For each of them:
 Advance the element. Let the new value be C.
 The children P, R of the element must have ¬σ(P), ¬σ(R). Overwrite ρ(P) and ρ(R)
with top*. P and R now form epheaps leading to top.
 Use the algorithm to combine P, R and C into a new epheap leading to top.
 Let its root be Q. Because the heap leads to top we have ¬σ(Q).
 Continue with the next deepest level of the hierarchy until the top is reached.
\* The algorithm never actually uses the values of the parent function ρ, only the fact that
σ is set correctly with respect to that parent. This means this step does not need to be
actually performed.
{panel}
Updated [the branchhttps://github.com/apache/cassandra/compare/trunk...blambov:8915mergeiterator]
to use this solution and cleaned the code a bit.
> Improve MergeIterator performance
> 
>
> Key: CASSANDRA8915
> URL: https://issues.apache.org/jira/browse/CASSANDRA8915
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: compaction, performance
> Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a pair of {{poll}}+{{add}}
operations for every item in the resulting sequence. This is quite inefficient as {{poll}}
necessarily applies at least {{log N}} comparisons (up to {{2log N}}), and {{add}} often requires
another {{log N}}, for example in the case where the inputs largely don't overlap (where {{N}}
is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform replacement
of the top of the queue in a single step, which will very often complete after a couple of
comparisons and in the worst case scenarios will match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with limited overlap
(e.g. levelled compaction).

This message was sent by Atlassian JIRA
(v6.3.4#6332)
