commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <>
Subject [jira] [Commented] (LANG-839) ArrayUtils removeElements methods use unnecessary HashSet
Date Tue, 09 Oct 2012 01:20:03 GMT


Sebb commented on LANG-839:

BitSet is considerably faster on Win XP:

Ratio=38% count=0 hash=610134 bits=234388
Ratio=46% count=5 hash=1809448 bits=837536
Ratio=54% count=10 hash=2840584 bits=1536229
Ratio=38% count=200 hash=72772936 bits=28216994
Ratio=37% count=50 hash=17909539 bits=6729347
Ratio=39% count=100 hash=35617096 bits=13972166
Ratio=40% count=1000 hash=339097567 bits=138882176
Ratio=42% count=2000 hash=650113632 bits=278152949

and Continuum:

Ratio=10% count=0 hash=1164164 bits=126956
Ratio=15% count=5 hash=1433866 bits=228518
Ratio=18% count=10 hash=1911315 bits=355922
Ratio=17% count=200 hash=31370106 bits=5439748
Ratio=18% count=50 hash=6947508 bits=1271146
Ratio=18% count=100 hash=13671526 bits=2555063
Ratio=15% count=1000 hash=154243712 bits=24577725
Ratio=10% count=2000 hash=411835139 bits=43056221
> ArrayUtils removeElements methods use unnecessary HashSet
> ---------------------------------------------------------
>                 Key: LANG-839
>                 URL:
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.1
>            Reporter: Sebb
>            Priority: Minor
>         Attachments: LANG-839.patch
> The removeElements() methods use a HashSet to collect the indexes that need removing.
> This requires creating Integer objects for each index, and the HashSet then has to be
converted into an int[] array.
> It would be more efficient to store the entries in an actual int[] array.
> The maximum size of this is the length of the values array (or the length of the input
array if that is shorter).
> The array must be truncated before calling the private removeAll() method; this can be
done with Arrays.copyOf(x[], length).
> However, if the arrays are very large, and most of the values do not appear in the input,
this might result in using more memory than the HashSet implementation.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message