commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Barnes (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COLLECTIONS-433) TreeList.addAll() complexity
Date Tue, 26 Feb 2013 20:20:12 GMT

     [ https://issues.apache.org/jira/browse/COLLECTIONS-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeffrey Barnes updated COLLECTIONS-433:
---------------------------------------

    Attachment: COLLECTIONS-433.patch

I have developed a patch that greatly improves the performance of {{TreeList.addAll(Collection)}}.
The theoretical complexity is improved from _O_(_n_ log _m_) to _O_(_n_ + log _m_), where
_m_ is the size of the {{TreeList}} and _n_ is the size of the collection to be added, and
performance tests show a significant practical gain.

The Stack Overflow question that Adrian Nistor linked is informative but not the ideal algorithm
to use here, as it handles the case where there are two arbitrary (sorted) AVL trees that
must be merged into one (sorted) AVL tree. Because the elements of the two AVL trees may need
to be interleaved in arbitrary ways, the best we can do in that situation is to flatten the
trees into lists, merge them, and construct a new AVL tree, as the Stack Overflow answer says.

The situation here, however, is somewhat different, because we know that all the elements
of the new tree come to the right of the original tree. (The AVL tree that backs {{TreeList}}
is keyed by list index, not by element value.) Thus, a more on-point Stack Overflow question
is [this one|http://stackoverflow.com/questions/2037212/concatenating-merging-joining-two-avl-trees],
which deals with the question of merging/concatenating two AVL trees when the elements of
the first tree are known to have smaller keys than those of the second tree. In this case,
we can merge the trees in logarithmic time! The algorithm, outlined by user meriton on Stack
Overflow, is:
# Determine the height of both trees. Assume the right tree is taller. (The other case is
symmetric.)
# Remove the max element, _x_, from the left tree.
# In the right tree, navigate left until you reach the subtree, _s_, that is no taller than
the left tree.
# Replace that subtree with a new subtree whose root is _x_, whose left subtree is the left
tree, and whose right subtree is _s_.
# Rebalance.

This is a destructive operation, so to satisfy the contract for {{addAll(Collection)}}, we
have to first copy the argument to a new tree, which takes _O_(_n_). But the good news is
we can do this in _O_(_n_) for any collection, not just a {{TreeList}}! So now {{addAll(Collection)}}
has complexity _O_(_n_ + log _m_) for _any_ collection, regardless of whether it is a {{TreeList}}.

To accomplish linear conversion of collections to trees, I also reimplemented the constructor
{{TreeList(Collection)}}, which previously just invoked {{addAll}}. The algorithm for converting
a sorted list to an AVL tree in _O_(_n_) time is given [here|http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html].

Here are results of some performance tests where I used {{TreeList.addAll}} to merge two collections
of size _n_ for varying _n_ (averaged over 50 runs):

||n||old||new||
|160000|0.190|0.007|
|320000|0.407|0.015|
|480000|0.615|0.108|
|640000|0.853|0.142|
|800000|1.075|0.140|
|960000|1.315|0.144|
|1120000|1.617|0.158|
|1280000|1.871|0.248|
|1440000|2.109|0.258|
|1600000|2.316|0.173|
|1760000|2.559|0.288|
|1920000|2.852|0.292|

I have tested this patch rather extensively to ensure its correctness. I subjected it to a
test battery in which I randomly applied hundreds of thousands of arbitrary operations to
various {{TreeLists}}, checking list invariants and contents after each test, so I am pretty
confident in the correctness of this new code.

I encountered an unrelated bug in {{TreeListIterator}} which I have also fixed as part of
this patch. Use of the {{remove}} operation could, for certain tree structures, cause the
iterator to enter a bad state and return incorrect results. I include it with this patch because
it was necessary to get a unit test to pass. (My changes to {{addAll}} changed the structure
of a tree constructed in a unit test in a way that happened to elicit the bad behavior.) This
change is minor compared with the substantial changes to {{addAll}} and the {{TreeList(Collection)}}
constructor, but let me know if you'd like me to say more on this point.

Incidentally, I have not touched {{addAll(int, Collection)}}, the overload of {{addAll}}
that adds the contents of a collection at a specified index, rather than at the end of the
list. I believe that could be done efficiently too, but it would require a different algorithm.

This is my first contribution, so I hope I did things right. Please be gentle! :)
                
> TreeList.addAll() complexity
> ----------------------------
>
>                 Key: COLLECTIONS-433
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-433
>             Project: Commons Collections
>          Issue Type: Improvement
>    Affects Versions: 3.2.1
>            Reporter: Adrian Nistor
>         Attachments: COLLECTIONS-433.patch
>
>
> "TreeList.addAll(Collection coll)" has a higher complexity than
> necessary when "coll" is a "TreeList" object (because "addAll" just
> adds one element at a time).  This can be done in just O(N) as
> described for example here:
> http://stackoverflow.com/questions/4458489/merging-2-diferent-avl-trees
> Are there any plans to improve this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message