commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrian Nistor (JIRA)" <j...@apache.org>
Subject [jira] [Created] (COLLECTIONS-406) ListUtils.subtract is very slow
Date Sun, 27 May 2012 19:54:22 GMT
Adrian Nistor created COLLECTIONS-406:
-----------------------------------------

             Summary: ListUtils.subtract is very slow 
                 Key: COLLECTIONS-406
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-406
             Project: Commons Collections
          Issue Type: Bug
          Components: Core
    Affects Versions: 3.2.1
         Environment: java 1.6.0_24
Ubuntu 11.10
            Reporter: Adrian Nistor
         Attachments: Test.java, patch.diff

Hi,

ListUtils.subtract is very slow when subtracting two large lists.  The
root cause of this problem is similar to the root cause of the
previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
complexity instead of linear complexity.

I am encountering this problem in version 3.2.1 and also in revision
1342815 (May 25th 2012).  I have attached a test that exposes this
problem and a simple patch.  On my machine, for the attached test,
this patch provides a 95X speedup.

To run the test, just do:

$ java Test

Currently, the code for ListUtils.subtract is:

final ArrayList<E> result = new ArrayList<E>(list1);
for (E e : list2) {
    result.remove(e);
}
return result;

which is quadratic, because result.remove(e) has linear complexity.

The attached patch makes the remove() be constant complexity by
removing from an org.apache.commons.collections.bag.HashBag.  I use
HashBag and not HashSet because ListUtils.subtract needs to respect
the cardinality when there are repeated objects in the list.

As mentioned in COLLECTIONS-328 discussion, for small lists, there is
some overhead for creating the HashBag.  This can be fixed with a
threshold, but I did not implement it in my patch because the
COLLECTIONS-328 patch does not implement it.

Unlike the patch for COLLECTIONS-328, my patch does not choose the
list to iterate over based on size, because of the cardinality
requirement in subtract.  This means the code could be made even
faster if we could use something like a LinkedHashBag but neither
Apache Collections nor standard Java libraries have such a class.
Even so, this patch is still a lot faster than the original version.

Is this truly a bug, or am I missing something here?  If so, can you
please confirm if the patch is correct?

Thanks,

Adrian




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message