commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (COLLECTIONS-547) Performance issue in SetUtils::isEqualSet
Date Thu, 22 Jan 2015 09:18:35 GMT

     [ https://issues.apache.org/jira/browse/COLLECTIONS-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Neidhart resolved COLLECTIONS-547.
-----------------------------------------
    Resolution: Not a Problem

The javadoc of the method clearly states the purpose and function of this method.

It should only be used when the first parameter is a Set or set-like data structure with a
fast contains implementation. In other cases this method should not be used.

Delegating this method to CollectionUtils.containsAll() is no good as it has an additional
memory overhead which is not needed when used properly.

> Performance issue in SetUtils::isEqualSet
> -----------------------------------------
>
>                 Key: COLLECTIONS-547
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-547
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Set
>    Affects Versions: 4.1
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: perfomance
>
> There seems to be a performance problem with the function isEqualSet of 
> SetUtils when the first parameter is of a collection type that has a slow containsAll/contains
method.
> The following is the code of the function:
> {code}
>     /**
>      * Tests two sets for equality as per the <code>equals()</code> contract
>      * in {@link java.util.Set#equals(java.lang.Object)}.
>      * <p>
>      * This method is useful for implementing <code>Set</code> when you cannot
>      * extend AbstractSet. The method takes Collection instances to enable other
>      * collection types to use the Set implementation algorithm.
>      * <p>
>      * The relevant text (slightly paraphrased as this is a static method) is:
>      * <blockquote>
>      * <p>Two sets are considered equal if they have
>      * the same size, and every member of the first set is contained in
>      * the second. This ensures that the <tt>equals</tt> method works
>      * properly across different implementations of the <tt>Set</tt>
>      * interface.</p>
>      *
>      * <p>
>      * This implementation first checks if the two sets are the same object:
>      * if so it returns <tt>true</tt>.  Then, it checks if the two sets are
>      * identical in size; if not, it returns false. If so, it returns
>      * <tt>a.containsAll((Collection) b)</tt>.</p>
>      * </blockquote>
>      *
>      * @see java.util.Set
>      * @param set1  the first set, may be null
>      * @param set2  the second set, may be null
>      * @return whether the sets are equal by value comparison
>      */
>     public static boolean isEqualSet(final Collection<?> set1, final Collection<?>
set2) {
>         if (set1 == set2) {
>             return true;
>         }
>         if (set1 == null || set2 == null || set1.size() != set2.size()) {
>             return false;
>         }
>         return set1.containsAll(set2);
>     }
> {code}
> The problem is that in the last return statement, the function relies on the 
> containsAll method of the class of the set1, which can be any type of collection.
> The following test harness shows the performance degradation between 
> using a list and using a set as a first parameter, across different collection sizes.
> {code}
>     public static void 	setUtilsisEqualSetTest(boolean original) {
> 	int N=500000;
> 	ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> 	ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> 	for(int i=0;i<N;++i) {
> 	    firstArrayList.add(new Integer(i));
> 	    secondArrayList.add(new Integer((N-1)-i));
> 	    
> 	}
> 	SetUtils.isEqualSet(original?firstArrayList:(new HashSet<Integer>(firstArrayList)),secondArrayList);
> 	
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of SetUtils::isEqualSet, "Repaired" to the time 
> taken by the function invoked with the first parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
>        N	Original(s)	Repaired(s)	Speed-up(X)
>       10	          1.01		    1.02		     0.99
>      100	          1.02		    1.02		     1
>     1000	          1.04		    1.04		     1
>    10000	          1.15		    1.09		     1.05
>   100000	          9.33		    1.11		     8.40
>   500000	        > 300	            1.26		     > 238.09
> One way to deal with this would be to call the CollectionUtils::containsAll method instead
of Collection::containsAll, since it has a linear time implementation instead of quadratic,
and can handle the types of isEqualSet.
> Another solution would be to include a warning to the user in the documentation that
the first parameter should have a fast containment method when possible.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message