commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-427) performance problem in SetUniqueList.retainAll()
Date Sun, 25 Jan 2015 10:40:34 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291035#comment-14291035
] 

Thomas Neidhart commented on COLLECTIONS-427:
---------------------------------------------

The fix is technically correct but conceptually wrong.

It tries to improve the worst-case scenario for a well-known method at the expense of additional
space complexity.
It many cases the additional temporary set is unneeded, especially when the provided collection
has a fast contains method or the collection is small.

A user can always solve the performance problem him/herself by putting the elements in a set
and provide this as parameter to retainAll.

The described problem also applies to almost *all* collection types as the default implementation
of retainAll is provided in AbstractCollection and is unchanged for all the collections I
know of. If you want to press forward your fix, I suggest to move it to the openjdk project
and discuss it there, actually I would be interested on their opinions on this issue.

> performance problem in SetUniqueList.retainAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-427
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-427
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Mert Guldur
>             Fix For: 4.0-alpha1, 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> I am encountering a performance problem in SetUniqueList.retainAll().
> It appears in version 3.2.1 and also in revision 1365132.  I attached
> a test that exposes this problem and a patch that fixes it.  On my
> machine, for this test, the patch provides a 621X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 6215
> The output for the patched version is:
> Time is 10
> There are two problems here.  First, "SetUniqueList.retainAll()"
> should have similar implementation with the current implementation of
> "ListOrderedSet.retainAll()", which is more optimized.  Second, even
> "ListOrderedSet.retainAll()" has a performance problem, which was
> reported and explained in detail in COLLECTIONS-426.
> The attached patch has two parts.  The first part (the first loop) is
> inspired from COLLECTIONS-426.  The second part (everything after the
> first loop) is in fact the current implementation of
> "ListOrderedSet.retainAll()", with some minor changes to adapt it for
> the current code.  Overall, the attached patch is very similar to the
> COLLECTIONS-426 patch.
> I will rehash some of the information from COLLECTIONS-426 (which
> describes "ListOrderedSet.retainAll()") for the current
> "SetUniqueList.retainAll()".
> The current code for "SetUniqueList.retainAll()" is:
> {code:java|borderStyle=solid}
> public boolean retainAll(Collection<?> coll) {
>     boolean result = super.retainAll(coll);
>     set.retainAll(coll);
>     return result;
> }
> {code} 
> where both "super.retainAll(coll)" and "set.retainAll(coll)" can have
> quadratic complexity, e.g., if "coll" is a List.  Both these calls to
> "retainAll" are in fact calls to
> "java.util.AbstractCollection.retainAll()", which has the code:
> {code:java|borderStyle=solid}
> public boolean retainAll(Collection<?> c) {
>     boolean modified = false;
>     Iterator<E> e = iterator();
>     while (e.hasNext()) {
>         if (!c.contains(e.next())) {
>             e.remove();
>             modified = true;
>         }
>     }
>     return modified;
> }
> {code} 
> which iterates over "this" and calls "contains()" on "c".  Mapping
> this code back to "SetUniqueList.retainAll()" means that the code
> iterates over "this" and "set" and calls "contains()" on "coll".  If
> "coll" has slow "contains()" (e.g., if "coll" is a list), then
> "SetUniqueList.retainAll()" has quadratic complexity.
> The patch iterates over "coll" and calls "contains()" on "set", which
> we know is fast, because "set" is a Set.  For a more detailed
> discussion of the patch and the problem, see the current
> implementation of "ListOrderedSet.retainAll()", the discussion for
> COLLECTIONS-426, and the patch for COLLECTIONS-426.
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm if the patch is correct?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message