commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <>
Subject [jira] [Resolved] (COLLECTIONS-406) ListUtils.subtract is very slow
Date Sun, 03 Jun 2012 10:27:23 GMT


Thomas Neidhart resolved COLLECTIONS-406.

       Resolution: Fixed
    Fix Version/s: 4.0

Thanks for reporting this issue and providing a patch.

The patch has been applied in r1345644 together with unit tests for the subtract method (was
missing before).
> ListUtils.subtract is very slow 
> --------------------------------
>                 Key: COLLECTIONS-406
>                 URL:
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>         Attachments:, patch.diff
> Hi,
> ListUtils.subtract is very slow when subtracting two large lists.  The
> root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
> complexity instead of linear complexity.
> I am encountering this problem in version 3.2.1 and also in revision
> 1342815 (May 25th 2012).  I have attached a test that exposes this
> problem and a simple patch.  On my machine, for the attached test,
> this patch provides a 95X speedup.
> To run the test, just do:
> $ java Test
> Currently, the code for ListUtils.subtract is:
> final ArrayList<E> result = new ArrayList<E>(list1);
> for (E e : list2) {
>     result.remove(e);
> }
> return result;
> which is quadratic, because result.remove(e) has linear complexity.
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  I use
> HashBag and not HashSet because ListUtils.subtract needs to respect
> the cardinality when there are repeated objects in the list.
> As mentioned in COLLECTIONS-328 discussion, for small lists, there is
> some overhead for creating the HashBag.  This can be fixed with a
> threshold, but I did not implement it in my patch because the
> COLLECTIONS-328 patch does not implement it.
> Unlike the patch for COLLECTIONS-328, my patch does not choose the
> list to iterate over based on size, because of the cardinality
> requirement in subtract.  This means the code could be made even
> faster if we could use something like a LinkedHashBag but neither
> Apache Collections nor standard Java libraries have such a class.
> Even so, this patch is still a lot faster than the original version.
> Is this truly a bug, or am I missing something here?  If so, can you
> please confirm if the patch is correct?
> Thanks,
> Adrian

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message