commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-427) performance problem in SetUniqueList.retainAll()
Date Sun, 25 Jan 2015 12:59:34 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291086#comment-14291086
] 

Thomas Neidhart commented on COLLECTIONS-427:
---------------------------------------------

Again, users need to know what they are doing and be aware of the performance constraints
of the collection types they are using.
I agree that the implemented algorithm of retainAll is not explained in detail (the original
Collection#retainAll method has just a reference to contains(), but does not describe the
performance implications), and we will add a note as in other cases, but this is not a problem
limited to commons-collections but to the collection framework of java in general.

That's why I suggested to move the discussion to openjdk. In fact, if your proposed fix would
be implemented in AbstractCollection the problem would go away for almost all collection types
and users would be safe (or is that so?).

On a personal note: I strongly doubt that the described problem is a "real" one. A quick search
on stackoverflow with "retainAll slow" lists two posts with both suggesting to call it with
a set.

> performance problem in SetUniqueList.retainAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-427
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-427
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Mert Guldur
>             Fix For: 4.0-alpha1, 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> I am encountering a performance problem in SetUniqueList.retainAll().
> It appears in version 3.2.1 and also in revision 1365132.  I attached
> a test that exposes this problem and a patch that fixes it.  On my
> machine, for this test, the patch provides a 621X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 6215
> The output for the patched version is:
> Time is 10
> There are two problems here.  First, "SetUniqueList.retainAll()"
> should have similar implementation with the current implementation of
> "ListOrderedSet.retainAll()", which is more optimized.  Second, even
> "ListOrderedSet.retainAll()" has a performance problem, which was
> reported and explained in detail in COLLECTIONS-426.
> The attached patch has two parts.  The first part (the first loop) is
> inspired from COLLECTIONS-426.  The second part (everything after the
> first loop) is in fact the current implementation of
> "ListOrderedSet.retainAll()", with some minor changes to adapt it for
> the current code.  Overall, the attached patch is very similar to the
> COLLECTIONS-426 patch.
> I will rehash some of the information from COLLECTIONS-426 (which
> describes "ListOrderedSet.retainAll()") for the current
> "SetUniqueList.retainAll()".
> The current code for "SetUniqueList.retainAll()" is:
> {code:java|borderStyle=solid}
> public boolean retainAll(Collection<?> coll) {
>     boolean result = super.retainAll(coll);
>     set.retainAll(coll);
>     return result;
> }
> {code} 
> where both "super.retainAll(coll)" and "set.retainAll(coll)" can have
> quadratic complexity, e.g., if "coll" is a List.  Both these calls to
> "retainAll" are in fact calls to
> "java.util.AbstractCollection.retainAll()", which has the code:
> {code:java|borderStyle=solid}
> public boolean retainAll(Collection<?> c) {
>     boolean modified = false;
>     Iterator<E> e = iterator();
>     while (e.hasNext()) {
>         if (!c.contains(e.next())) {
>             e.remove();
>             modified = true;
>         }
>     }
>     return modified;
> }
> {code} 
> which iterates over "this" and calls "contains()" on "c".  Mapping
> this code back to "SetUniqueList.retainAll()" means that the code
> iterates over "this" and "set" and calls "contains()" on "coll".  If
> "coll" has slow "contains()" (e.g., if "coll" is a list), then
> "SetUniqueList.retainAll()" has quadratic complexity.
> The patch iterates over "coll" and calls "contains()" on "set", which
> we know is fast, because "set" is a Set.  For a more detailed
> discussion of the patch and the problem, see the current
> implementation of "ListOrderedSet.retainAll()", the discussion for
> COLLECTIONS-426, and the patch for COLLECTIONS-426.
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm if the patch is correct?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message