commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-839) ArrayUtils removeElements methods use unnecessary HashSet
Date Tue, 09 Oct 2012 15:16:03 GMT

    [ https://issues.apache.org/jira/browse/LANG-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472455#comment-13472455
] 

Sebb commented on LANG-839:
---------------------------

Further testing shows that the approach used in the original patch - i.e. using a version
of removeAll() that processes the BitSet directly - is generally faster than using the original
removeAll() method after converting the BitSet to int[]

Win XP:

Ratio=92% array=100 count=1 extract=10314998 bitset=9571607
Ratio=68% array=100 count=10 extract=14912510 bitset=10283430
Ratio=15% array=100 count=50 extract=50206660 bitset=8017779
Ratio=8% array=100 count=100 extract=92868228 bitset=7494807
Ratio=86% array=1000 count=10 extract=42377453 bitset=36508272
Ratio=28% array=1000 count=100 extract=124472803 bitset=35606481
Ratio=4% array=1000 count=500 extract=570030828 bitset=24349463
Ratio=1% array=1000 count=1000 extract=1099601765 bitset=12346262

Continuum:

Ratio=76% array=100 count=1 extract=2948847 bitset=2257111
Ratio=32% array=100 count=10 extract=4860676 bitset=1589708
Ratio=6% array=100 count=50 extract=17143953 bitset=1160451
Ratio=1% array=100 count=100 extract=29390021 bitset=449595
Ratio=87% array=1000 count=10 extract=16487025 bitset=14461313
Ratio=30% array=1000 count=100 extract=42920962 bitset=13228312
Ratio=4% array=1000 count=500 extract=199373015 bitset=9112329
Ratio=0% array=1000 count=1000 extract=387091985 bitset=1126133

                
> ArrayUtils removeElements methods use unnecessary HashSet
> ---------------------------------------------------------
>
>                 Key: LANG-839
>                 URL: https://issues.apache.org/jira/browse/LANG-839
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.1
>            Reporter: Sebb
>            Priority: Minor
>             Fix For: 3.2
>
>         Attachments: LANG-839.patch
>
>
> The removeElements() methods use a HashSet to collect the indexes that need removing.
> This requires creating Integer objects for each index, and the HashSet then has to be
converted into an int[] array.
> It would be more efficient to store the entries in an actual int[] array.
> The maximum size of this is the length of the values array (or the length of the input
array if that is shorter).
> The array must be truncated before calling the private removeAll() method; this can be
done with Arrays.copyOf(x[], length).
> However, if the arrays are very large, and most of the values do not appear in the input,
this might result in using more memory than the HashSet implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message