Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3489B18236 for ; Wed, 3 Jun 2015 15:00:40 +0000 (UTC) Received: (qmail 92566 invoked by uid 500); 3 Jun 2015 15:00:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 92531 invoked by uid 500); 3 Jun 2015 15:00:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 92446 invoked by uid 99); 3 Jun 2015 15:00:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2015 15:00:39 +0000 Date: Wed, 3 Jun 2015 15:00:39 +0000 (UTC) From: "Branimir Lambov (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-8915) Improve MergeIterator performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8915?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14569392#comment-14569392 ]=20 Branimir Lambov edited comment on CASSANDRA-8915 at 6/3/15 3:00 PM: -------------------------------------------------------------------- I wasn't completely sure of the correctness of the flagging approach and I = wrote the following proof to make sure I'm not missing something: {panel} In the discussion below a set of ep-heaps over some space of elements is de= fined by two functions, {{=CF=81(N)}} which stands for "parent of N", and {= {=CF=83(N)}} which stands for "equals parent". We will use 'element' and 'n= ode' interchangeably as each element E also defines a node using the =CF=81= function with parent {{=CF=81(E)}} and children {{\{x: =CF=81(X) =E2=89=A1= E\}}}. The operator =3D as used below denotes element equality (not identi= ty), distinct elements are allowed to be equal. We define "ep-heap leading to M" to be a binary tree of nodes where each no= de N satisfies {{=CF=81(N) =E2=89=A4 N & =CF=83(N) =E2=86=94 =CF=81(N) =3D = N}} and the root has M as its parent. Full ep-heaps are ep-heaps leading to= =E2=94=AC, where '=E2=94=AC' is assumed to be a special value strictly sma= ller than any other (thus {{=CF=83(R)}} is always false for the root of a f= ull ep-heap). For simplicity we will take all heaps to end at a special ele= ment called '=E2=8A=A5' whose value is assumed to be greater than any other= value. The critical step here is adding a new value {{C > M}} to combine two exist= ing well-formed ep-heaps with roots P and R leading to the same M into a bi= gger single well-formed ep-heap leading to M. Without loss of generality we= can take {{P =E2=89=A4 R}} (otherwise we can swap the places of P and R in= the discussion below). We distinguish three cases: # {{C < P}} (necessarily {{=C2=AC=CF=83(P), =C2=AC=CF=83(R)}}): We place C = at the top by setting {{=CF=81(P), =CF=81(R) :=3D C}}, {{=CF=81(C) :=3D M}}= and {{=CF=83(C) :=3D false}} and leaving {{=CF=83(P), =CF=83(R) :=3D false= }}. The ep-heap rooted at C is well-formed leading to M. # {{C =3D P}} (necessarily {{=C2=AC=CF=83(P), =C2=AC=CF=83(R)}}): We place = C at the top, setting {{=CF=81(P), =CF=81(R) :=3D C}}, {{=CF=81(C) :=3D M}}= ; {{=CF=83(P) :=3D true}}, {{=CF=83(R) :=3D (P =3D R)}} and {{=CF=83(C) := =3D false}}. The heap rooted at C is well-formed as {{C =3D P & =CF=83(P)}}= as well as {{C =3D P =E2=89=A4 R}} and {{C =3D R}} if and only if {{=CF=83= (R)}}. # {{C > P}} ({{=CF=83(P), =CF=83(P)}} can be true or false): Place P at the= top: set {{=CF=81(R) :=3D P}} and if {{=C2=AC=CF=83(P)}}, set {{=CF=83(R) = :=3D (P =3D R)}}, otherwise leave {{=CF=83(R)}} unchanged; use the algorith= m recursively to merge the two ep-subheaps leading to P at its children and= the new value {{C > P}} into an ep-heap leading to P. Let its root be Q. T= he necessary conditions for an element of an ep-heap hold for Q. They are a= lso true for R as {{P =E2=89=A4 R}} and either {{=CF=83(P)}} in which case = {{P =3D M}} and {{=CF=83(R)}} if and only if {{R =3D M =3D P}}, or {{=C2=AC= =CF=83(P)}} and {{=CF=83(R)}} is explicitly set to be true if and only if {= {P =3D R}}. We also have {{=CF=81(P) =3D M}} as well as {{P =3D M}} if and = only if {{=CF=83(P)}}, thus the constructed ep-heap rooted at P leads to M. The recursion always ends for finite heaps as we must reach the first case = due to the ordering of =E2=8A=A5. For performance the order of comparing elements and identifying the case to= use can be changed to sequentially checking: - {{=CF=83(P)}}: apply pt. 3 for P and return (P must be =E2=89=A4 R). No u= pdate of {{=CF=83(P)}}/{{=CF=83(R)}} necessary. - {{=CF=83(R)}}: apply pt. 3 for R and return (R must now be < P). No updat= e of {{=CF=83(P)}}/{{=CF=83(R)}} necessary. - compare P and R. - If {{R < P}}, perform the next steps with P and R swapped. - compare C and P. - if {{C < P}} apply pt. 1. - if {{C =3D P}} apply pt. 2. - if {{C > P}} apply pt. 3. Once an ep-heap with root R is formed, it trivially follows that for each e= lement E in the heap {{E =3D R}} if and only if =CF=83 is true for E and th= e chain of parents of E leading to but not including the root. To consume t= he equal elements we can thus walk the sub-tree formed by the children whic= h have =CF=83 set. To advance the elements equal to the root, we can: - Process the elements in the deepest level of the =CF=83 sub-tree in any o= rder. For each of them: -- Advance the element. Let the new value be C. -- The children P, R of the element must have {{=C2=AC=CF=83(P)}}, {{=C2= =AC=CF=83(R)}}. Overwrite {{=CF=81(P)}} and {{=CF=81(R)}} with =E2=94=AC*. = P and R now form ep-heaps leading to =E2=94=AC. -- Use the algorithm to combine P, R and C into a new ep-heap leading to = =E2=94=AC. -- Let its root be Q. Because the heap leads to =E2=94=AC we have {{=C2= =AC=CF=83(Q)}}. - Continue with the next deepest level of the hierarchy until the top is re= ached. \* The algorithm never actually uses the values of the parent function =CF= =81, only the fact that =CF=83 is set correctly with respect to that parent= . This means this step does not need to be actually performed. {panel} Updated [the branch|https://github.com/apache/cassandra/compare/trunk...bla= mbov:8915-mergeiterator] to use this solution and cleaned the code a bit. was (Author: blambov): I wasn't completely sure of the correctness of the flagging approach and I = wrote the following proof to make sure I'm not missing something: {panel} In the discussion below a set of ep-heaps over some space of elements is de= fined by two functions, {{=CF=81(N)}} which stands for "parent of N", and {= {=CF=83(N)}} which stands for "equals parent". We will use 'element' and 'n= ode' interchangeably as each element E also defines a node using the =CF=81= function with parent {{=CF=81(E)}} and children {{\{x: =CF=81(X) =E2=89=A1= E\}}}. The operator =3D as used below denotes element equality (not identi= ty), distinct elements are allowed to be equal. We define "ep-heap leading to M" to be a binary tree of nodes where each no= de N satisfies {{=CF=81(N) =E2=89=A4 N & =CF=83(N) =E2=86=94 =CF=81(N) =3D = N}} and the root has M as its parent. Full ep-heaps are ep-heaps leading to= =E2=94=AC, where '=E2=94=AC' is assumed to be a special value strictly sma= ller than any other (thus {{=CF=83(R)}} is always false for the root of a f= ull ep-heap). For simplicity we will take all heaps to end at a special ele= ment called '=E2=8A=A5' whose value is assumed to be greater than any other= value. The critical step here is adding a new value {{C > M}} to combine two exist= ing well-formed ep-heaps with roots P and R leading to the same M into a bi= gger single well-formed ep-heap leading to M. Without loss of generality we= can take {{P =E2=89=A4 R}} (otherwise we can swap the places of P and R in= the discussion below). We distinguish three cases: # {{C < P}} (necessarily {{=C2=AC=CF=83(P)}}): We place C at the top by set= ting {{=CF=81(P), =CF=81(R) :=3D C}}, {{=CF=81(C) :=3D M}} and leaving {{= =CF=83(P), =CF=83(R) :=3D false}} (Note that {{=CF=83(R)}} cannot be true a= s that contradicts with {{C > M}}). {{=CF=83(C) :=3D false}}. The ep-heap r= ooted at C is well-formed leading to M. # {{C =3D P}} (necessarily {{=C2=AC=CF=83(P)}}): We place C at the top, set= ting {{=CF=81(P), =CF=81(R) :=3D C}}, {{=CF=81(C) :=3D M}}, {{=CF=83(P) := =3D true}} and {{=CF=83(R) :=3D (P =3D R)}}. {{=CF=83(C) :=3D false}}. The = heap rooted at C is well-formed as {{C =3D P & =CF=83(P)}} as well as {{C = =3D P =E2=89=A4 R}} and {{C =3D R}} if and only if {{=CF=83(R)}}. # {{C > P}} ({{=CF=83(P)}} can be true or false): Place P at the top: set {= {=CF=81(R) :=3D P}} and if {{=C2=AC=CF=83(P)}}, set {{=CF=83(R) :=3D (P =3D= R)}}, otherwise leave {{=CF=83(R)}} unchanged; use the algorithm recursive= ly to merge the two ep-subheaps leading to P at its children and the new va= lue {{C > P}} into an ep-heap leading to P. Let its root be Q. The necessar= y conditions for an element of an ep-heap hold for Q. They are also true fo= r R as {{P =E2=89=A4 R}} and either {{=CF=83(P)}} in which case {{P =3D M}}= and {{=CF=83(R)}} if and only if {{R =3D M =3D P}}, or {{=C2=AC=CF=83(P)}}= and {{=CF=83(R)}} is explicitly set to be true if and only if {{P =3D R}}.= We also have {{=CF=81(P) =3D M}} as well as {{P =3D M}} if and only if {{= =CF=83(P)}}, thus the constructed ep-heap rooted at P leads to M. The recursion always ends for finite heaps as we must reach the first case = due to the ordering of =E2=8A=A5. For performance the order of comparing elements and identifying the case to= use can be changed to sequentially checking: - {{=CF=83(P)}}: apply pt. 3 for P and return (P must be =E2=89=A4 R). No u= pdate of {{=CF=83(P)}}/{{=CF=83(R)}} necessary. - {{=CF=83(R)}}: apply pt. 3 for R and return (R must now be < P). No updat= e of {{=CF=83(P)}}/{{=CF=83(R)}} necessary. - compare P and R. - If {{R < P}}, perform the next steps with P and R swapped. - compare C and P. - if {{C < P}} apply pt. 1. - if {{C =3D P}} apply pt. 2. - if {{C > P}} apply pt. 3. Once an ep-heap with root R is formed, it trivially follows that for each e= lement E in the heap {{E =3D R}} if and only if =CF=83 is true for E and th= e chain of parents of E leading to but not including the root. To consume t= he equal elements we can thus walk the sub-tree formed by the children whic= h have =CF=83 set. To advance the elements equal to the root, we can: - Process the elements in the deepest level of the =CF=83 sub-tree in any o= rder. For each of them: -- Advance the element. Let the new value be C. -- The children P, R of the element must have {{=C2=AC=CF=83(P)}}, {{=C2= =AC=CF=83(R)}}. Overwrite {{=CF=81(P)}} and {{=CF=81(R)}} with =E2=94=AC*. = P and R now form ep-heaps leading to =E2=94=AC. -- Use the algorithm to combine P, R and C into a new ep-heap leading to = =E2=94=AC. -- Let its root be Q. Because the heap leads to =E2=94=AC we have {{=C2= =AC=CF=83(Q)}}. - Continue with the next deepest level of the hierarchy until the top is re= ached. \* The algorithm never actually uses the values of the parent function =CF= =81, only the fact that =CF=83 is set correctly with respect to that parent= . This means this step does not need to be actually performed. {panel} Updated [the branch|https://github.com/apache/cassandra/compare/trunk...bla= mbov:8915-mergeiterator] to use this solution and cleaned the code a bit. > Improve MergeIterator performance > --------------------------------- > > Key: CASSANDRA-8915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8915 > Project: Cassandra > Issue Type: Improvement > Reporter: Branimir Lambov > Assignee: Branimir Lambov > Priority: Minor > Labels: compaction, performance > Fix For: 3.x > > > The implementation of {{MergeIterator}} uses a priority queue and applies= a pair of {{poll}}+{{add}} operations for every item in the resulting sequ= ence. This is quite inefficient as {{poll}} necessarily applies at least {{= log N}} comparisons (up to {{2log N}}), and {{add}} often requires another = {{log N}}, for example in the case where the inputs largely don't overlap (= where {{N}} is the number of iterators being merged). > This can easily be replaced with a simple custom structure that can perfo= rm replacement of the top of the queue in a single step, which will very of= ten complete after a couple of comparisons and in the worst case scenarios = will match the complexity of the current implementation. > This should significantly improve merge performance for iterators with li= mited overlap (e.g. levelled compaction). -- This message was sent by Atlassian JIRA (v6.3.4#6332)