Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4044E17BA8 for ; Sun, 25 Jan 2015 12:52:36 +0000 (UTC) Received: (qmail 41229 invoked by uid 500); 25 Jan 2015 12:52:34 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 41137 invoked by uid 500); 25 Jan 2015 12:52:34 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 41126 invoked by uid 99); 25 Jan 2015 12:52:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jan 2015 12:52:34 +0000 Date: Sun, 25 Jan 2015 12:52:34 +0000 (UTC) From: "Adrian Nistor (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COLLECTIONS-427) performance problem in SetUniqueList.retainAll() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COLLECTIONS-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291085#comment-14291085 ] Adrian Nistor commented on COLLECTIONS-427: ------------------------------------------- Hi Sebb, > at least users who want to trade memory for speed can do so. Ok. Let's at least write this down in the method Javadoc, like in COLLECTIONS-415 and COLLECTIONS-417? Best, Adrian > performance problem in SetUniqueList.retainAll() > ------------------------------------------------ > > Key: COLLECTIONS-427 > URL: https://issues.apache.org/jira/browse/COLLECTIONS-427 > Project: Commons Collections > Issue Type: Bug > Affects Versions: 3.2.1 > Environment: java 1.6.0_24 > Ubuntu 11.10 > Reporter: Mert Guldur > Fix For: 4.0-alpha1, 4.0 > > Attachments: Test.java, patch.diff > > > I am encountering a performance problem in SetUniqueList.retainAll(). > It appears in version 3.2.1 and also in revision 1365132. I attached > a test that exposes this problem and a patch that fixes it. On my > machine, for this test, the patch provides a 621X speedup. > To run the test, just do: > $ java Test > The output for the un-patched version is: > Time is 6215 > The output for the patched version is: > Time is 10 > There are two problems here. First, "SetUniqueList.retainAll()" > should have similar implementation with the current implementation of > "ListOrderedSet.retainAll()", which is more optimized. Second, even > "ListOrderedSet.retainAll()" has a performance problem, which was > reported and explained in detail in COLLECTIONS-426. > The attached patch has two parts. The first part (the first loop) is > inspired from COLLECTIONS-426. The second part (everything after the > first loop) is in fact the current implementation of > "ListOrderedSet.retainAll()", with some minor changes to adapt it for > the current code. Overall, the attached patch is very similar to the > COLLECTIONS-426 patch. > I will rehash some of the information from COLLECTIONS-426 (which > describes "ListOrderedSet.retainAll()") for the current > "SetUniqueList.retainAll()". > The current code for "SetUniqueList.retainAll()" is: > {code:java|borderStyle=solid} > public boolean retainAll(Collection coll) { > boolean result = super.retainAll(coll); > set.retainAll(coll); > return result; > } > {code} > where both "super.retainAll(coll)" and "set.retainAll(coll)" can have > quadratic complexity, e.g., if "coll" is a List. Both these calls to > "retainAll" are in fact calls to > "java.util.AbstractCollection.retainAll()", which has the code: > {code:java|borderStyle=solid} > public boolean retainAll(Collection c) { > boolean modified = false; > Iterator e = iterator(); > while (e.hasNext()) { > if (!c.contains(e.next())) { > e.remove(); > modified = true; > } > } > return modified; > } > {code} > which iterates over "this" and calls "contains()" on "c". Mapping > this code back to "SetUniqueList.retainAll()" means that the code > iterates over "this" and "set" and calls "contains()" on "coll". If > "coll" has slow "contains()" (e.g., if "coll" is a list), then > "SetUniqueList.retainAll()" has quadratic complexity. > The patch iterates over "coll" and calls "contains()" on "set", which > we know is fast, because "set" is a Set. For a more detailed > discussion of the patch and the problem, see the current > implementation of "ListOrderedSet.retainAll()", the discussion for > COLLECTIONS-426, and the patch for COLLECTIONS-426. > Is this a bug, or am I misunderstanding the intended behavior? If so, > can you please confirm if the patch is correct? -- This message was sent by Atlassian JIRA (v6.3.4#6332)