Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECBD7EF6E for ; Tue, 26 Feb 2013 20:20:12 +0000 (UTC) Received: (qmail 58157 invoked by uid 500); 26 Feb 2013 20:20:12 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 58073 invoked by uid 500); 26 Feb 2013 20:20:12 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 58064 invoked by uid 99); 26 Feb 2013 20:20:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 20:20:12 +0000 Date: Tue, 26 Feb 2013 20:20:12 +0000 (UTC) From: "Jeffrey Barnes (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (COLLECTIONS-433) TreeList.addAll() complexity MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COLLECTIONS-433?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Barnes updated COLLECTIONS-433: --------------------------------------- Attachment: COLLECTIONS-433.patch I have developed a patch that greatly improves the performance of {{TreeLis= t.addAll(Collection)}}. The theoretical complexity is improved from _O_(_n_= =C2=A0log=C2=A0_m_) to _O_(_n_=C2=A0+=C2=A0log=C2=A0_m_), where _m_ is the = size of the {{TreeList}} and _n_ is the size of the collection to be added,= and performance tests show a significant practical gain. The Stack Overflow question that Adrian Nistor linked is informative but no= t the ideal algorithm to use here, as it handles the case where there are t= wo arbitrary (sorted) AVL trees that must be merged into one (sorted) AVL t= ree. Because the elements of the two AVL trees may need to be interleaved i= n arbitrary ways, the best we can do in that situation is to flatten the tr= ees into lists, merge them, and construct a new AVL tree, as the Stack Over= flow answer says. The situation here, however, is somewhat different, because we know that al= l the elements of the new tree come to the right of the original tree. (The= AVL tree that backs {{TreeList}} is keyed by list index, not by element va= lue.) Thus, a more on-point Stack Overflow question is [this one|http://sta= ckoverflow.com/questions/2037212/concatenating-merging-joining-two-avl-tree= s], which deals with the question of merging/concatenating two AVL trees wh= en the elements of the first tree are known to have smaller keys than those= of the second tree. In this case, we can merge the trees in logarithmic ti= me! The algorithm, outlined by user meriton on Stack Overflow, is: # Determine the height of both trees. Assume the right tree is taller. (The= other case is symmetric.) # Remove the max element, _x_, from the left tree. # In the right tree, navigate left until you reach the subtree, _s_, that i= s no taller than the left tree. # Replace that subtree with a new subtree whose root is _x_, whose left sub= tree is the left tree, and whose right subtree is _s_. # Rebalance. This is a destructive operation, so to satisfy the contract for {{addAll(Co= llection)}}, we have to first copy the argument to a new tree, which takes = _O_(_n_). But the good news is we can do this in _O_(_n_) for any collectio= n, not just a {{TreeList}}! So now {{addAll(Collection)}} has complexity _O= _(_n_=C2=A0+=C2=A0log=C2=A0_m_) for _any_ collection, regardless of whether= it is a {{TreeList}}. To accomplish linear conversion of collections to trees, I also reimplement= ed the constructor {{TreeList(Collection)}}, which previously just invoked = {{addAll}}. The algorithm for converting a sorted list to an AVL tree in _O= _(_n_) time is given [here|http://leetcode.com/2010/11/convert-sorted-list-= to-balanced-binary.html]. Here are results of some performance tests where I used {{TreeList.addAll}}= to merge two collections of size _n_ for varying _n_ (averaged over 50 run= s): ||n||old||new|| |160000|0.190|0.007| |320000|0.407|0.015| |480000|0.615|0.108| |640000|0.853|0.142| |800000|1.075|0.140| |960000|1.315|0.144| |1120000|1.617|0.158| |1280000|1.871|0.248| |1440000|2.109|0.258| |1600000|2.316|0.173| |1760000|2.559|0.288| |1920000|2.852|0.292| I have tested this patch rather extensively to ensure its correctness. I su= bjected it to a test battery in which I randomly applied hundreds of thousa= nds of arbitrary operations to various {{TreeLists}}, checking list invaria= nts and contents after each test, so I am pretty confident in the correctne= ss of this new code. I encountered an unrelated bug in {{TreeListIterator}} which I have also fi= xed as part of this patch. Use of the {{remove}} operation could, for certa= in tree structures, cause the iterator to enter a bad state and return inco= rrect results. I include it with this patch because it was necessary to get= a unit test to pass. (My changes to {{addAll}} changed the structure of a = tree constructed in a unit test in a way that happened to elicit the bad be= havior.) This change is minor compared with the substantial changes to {{ad= dAll}} and the {{TreeList(Collection)}} constructor, but let me know if you= 'd like me to say more on this point. Incidentally, I have not touched {{addAll(int,=C2=A0Collection)}}, the over= load of {{addAll}} that adds the contents of a collection at a specified in= dex, rather than at the end of the list. I believe that could be done effic= iently too, but it would require a different algorithm. This is my first contribution, so I hope I did things right. Please be gent= le! :) =20 > TreeList.addAll() complexity > ---------------------------- > > Key: COLLECTIONS-433 > URL: https://issues.apache.org/jira/browse/COLLECTIONS-43= 3 > Project: Commons Collections > Issue Type: Improvement > Affects Versions: 3.2.1 > Reporter: Adrian Nistor > Attachments: COLLECTIONS-433.patch > > > "TreeList.addAll(Collection coll)" has a higher complexity than > necessary when "coll" is a "TreeList" object (because "addAll" just > adds one element at a time). This can be done in just O(N) as > described for example here: > http://stackoverflow.com/questions/4458489/merging-2-diferent-avl-trees > Are there any plans to improve this? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira