commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jilles van Gurp (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COLLECTIONS-328) ListUtils.intersect is slow
Date Sat, 06 Jun 2009 06:14:07 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716834#action_12716834
] 

Jilles van Gurp commented on COLLECTIONS-328:
---------------------------------------------

It was so slow that I essentially did more or less what I describe above. I left the original
running while I was doing this and it was still running by the time I deployed my fix. I aborted
it and restarted it and then it finished in seconds. The big question is indeed what to do
with small lists. For big lists there's no question about what is the right approach. Also
you might wonder about the memory impact (modest, I would say). Otherwise, HashSet.contains
and List.contains should produce same results in this case so this should not affect semantics.
If list size is a concern, you could maybe come up with a threshold for deciding on which
implementation to use.

 

> ListUtils.intersect is slow
> ---------------------------
>
>                 Key: COLLECTIONS-328
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-328
>             Project: Commons Collections
>          Issue Type: Improvement
>            Reporter: Jilles van Gurp
>
> ListUtils.intersect is quite slow and can be improved by using a HashSet for the contains
operation which cuts the complexity from n^2 to n. I ran into this by intersecting two lists
with a few hundred thousand elements.
> current:
> public static List intersection(final List list1, final List list2) {
>         final ArrayList result = new ArrayList();
>         final Iterator iterator = list2.iterator();
>         while (iterator.hasNext()) {
>             final Object o = iterator.next();
>             if (list1.contains(o)) {
>                 result.add(o);
>             }
>         }
>         return result;
> Basically would work by inserting list1 into a HashSet like this:
> Set objs = new HashSet();
> objs.addAll(list1);
> and then instead of list1.contains do a objs.contains
> Further performance can be gained by picking the smallest list for this with a simple
size comparison.
> BTW what is this method supposed to do with duplicate entries in lists? Semantics are
not really clear here as opposed to set intersection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message