commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-548) Performance issue in CompositeCollection::retainAll
Date Sat, 24 Jan 2015 15:48:34 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290679#comment-14290679
] 

Thomas Neidhart commented on COLLECTIONS-548:
---------------------------------------------

The documentation clearly states what the method is doing:

{noformat}
     * This implementation calls <code>retainAll()</code> on each collection.
{noformat}

The retainAll() method of the Collection interface is also well-known and by default (see
AbstractCollection) calls contains() on the provided collection. Thus users should be aware
of this by now (2015). If a user is really calling retainAll() with a huge list, it's probably
better to put the elements in a set and provide this as an argument to retainAll().

My whole point is that there's no use in providing uber-collection types that have an optimal
runtime-complexity in all cases but with the trade-off of additional memory requirements.
Users have to chose and use proper collection types for their use case.

> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to https://issues.apache.org/jira/browse/COLLECTIONS-534
.
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current collection
have a slow retainAll method. Whenever we're relying on Collection::retainAll, slow cases
tend to occur when the parameter collection has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection sizes.
> {code}
>  public static void 	compositeCollectionRetainAllTest(boolean original) {
> 	int N=500000;
> 	ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> 	ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> 	for(int i=0;i<N;++i) {
> 	    firstArrayList.add(new Integer(i));
> 	    secondArrayList.add(new Integer(N-1-i));
> 	    
> 	}
> 	CompositeCollection col = new CompositeCollection(firstArrayList);
> 	col.retainAll(original ? secondArrayList : (new HashSet<Integer>(secondArrayList)));
> 	
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to the time
taken by the function invoked with the parameter converted into a set, and "Speed-up" to the
speed-up obtained after the repair.
> N		Original(s)	Repaired(s)	Speed-up(X)
> 10	         1.04		1.04		        1.00
> 100	         1.04		1.05		        0.99
> 1000	1.06		        1.06		        1.00
> 10000	1.12		        1.10		        1.01
> 100000	9.34		        1.15		        8.12
> 500000	> 300		1.29		        > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the documentation
that the parameter should have a fast contains method when possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message