commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <>
Subject [jira] [Commented] (LANG-839) ArrayUtils removeElements methods use unnecessary HashSet
Date Tue, 09 Oct 2012 15:16:03 GMT


Sebb commented on LANG-839:

Further testing shows that the approach used in the original patch - i.e. using a version
of removeAll() that processes the BitSet directly - is generally faster than using the original
removeAll() method after converting the BitSet to int[]

Win XP:

Ratio=92% array=100 count=1 extract=10314998 bitset=9571607
Ratio=68% array=100 count=10 extract=14912510 bitset=10283430
Ratio=15% array=100 count=50 extract=50206660 bitset=8017779
Ratio=8% array=100 count=100 extract=92868228 bitset=7494807
Ratio=86% array=1000 count=10 extract=42377453 bitset=36508272
Ratio=28% array=1000 count=100 extract=124472803 bitset=35606481
Ratio=4% array=1000 count=500 extract=570030828 bitset=24349463
Ratio=1% array=1000 count=1000 extract=1099601765 bitset=12346262


Ratio=76% array=100 count=1 extract=2948847 bitset=2257111
Ratio=32% array=100 count=10 extract=4860676 bitset=1589708
Ratio=6% array=100 count=50 extract=17143953 bitset=1160451
Ratio=1% array=100 count=100 extract=29390021 bitset=449595
Ratio=87% array=1000 count=10 extract=16487025 bitset=14461313
Ratio=30% array=1000 count=100 extract=42920962 bitset=13228312
Ratio=4% array=1000 count=500 extract=199373015 bitset=9112329
Ratio=0% array=1000 count=1000 extract=387091985 bitset=1126133

> ArrayUtils removeElements methods use unnecessary HashSet
> ---------------------------------------------------------
>                 Key: LANG-839
>                 URL:
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.1
>            Reporter: Sebb
>            Priority: Minor
>             Fix For: 3.2
>         Attachments: LANG-839.patch
> The removeElements() methods use a HashSet to collect the indexes that need removing.
> This requires creating Integer objects for each index, and the HashSet then has to be
converted into an int[] array.
> It would be more efficient to store the entries in an actual int[] array.
> The maximum size of this is the length of the values array (or the length of the input
array if that is shorter).
> The array must be truncated before calling the private removeAll() method; this can be
done with Arrays.copyOf(x[], length).
> However, if the arrays are very large, and most of the values do not appear in the input,
this might result in using more memory than the HashSet implementation.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message