harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [classlib][luni][performance] Improvements in Collections
Date Fri, 18 Apr 2008 19:10:43 GMT
Ok, Nathan, let's stop for a while.
Am I right that the problem is different-style implementation of
basically the same thing in all three HashMaps? Am I right that we
haven't any disagreements on the performance impact of such the
optimization?

I can elaborate on these issues if there are still misunderstandings :)

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 11:06 PM, Nathan Beyer <ndbeyer@apache.org> wrote:
> On Fri, Apr 18, 2008 at 1:53 PM, Sergey Salishev <
>
> sergey.i.salishev@gmail.com> wrote:
>
>
> > Nathan,
>  >
>  > The "load factor" does't impact the capacity growth scale. It only governs
>  > the persentage of the free space reserved inside the current capacity. So
>  > if
>  > the initial capacity is X the capacity growth scale would be 2X 4X 8X...
>  > with all load factor values.
>
>
>  Correct, it determines when the capacity should increase.
>
>
>
>  >
>  >
>  > The changes proposed by Alexey for WeakHashMap are already applied to
>  > HashMap almost a year ago.
>
>
>  That is not what I understand. The majority of the patch has nothing to do
>  with the capacity calculation. Additionally, it's not the same; the method
>  names are different, the algorithm is slightly different, etc. If this
>  capacity calculation is good for all three HashMaps, then put the exact same
>  code in all three HashMaps. I'll vote for consistency and clarity
>  (java.util.HashMap's 'calculateCapacity' method is much clearer).
>
>  -Nathan
>
>
>  >
>  >
>  > Thanks.
>  > Sergey.
>
> >
>  > On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <ndbeyer@apache.org<https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
>  > wrote:
>  >
>  > > On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
>
> > > sergey.i.salishev@gmail.com<https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com>>
>  > wrote:
>  > >
>  > > > Nathan
>  > > >
>
>
> > > > I agree with  you. This can be a problem. In my practice the biggest
>  > > > problems were with 1-2 sized hash maps taking 16 size arrays. Probably
>  > > the
>  > > > best place to implement % optimization would bi CPU:)
>  > > > We can try the different approach. The HashMap growth is governed only
>  > > by
>  > > > initial capacity so in the HashMap constructor one can say is 2^k or
>  > > not.
>  > >
>  > >
>  > > What about 'load factor'? That governs the growth as well.
>  > >
>  > > -Nathan
>  > >
>  > >
>  > > >
>  > > >
>  > > > We can write something like
>  > > >
>  > > > <code>
>  > > > final boolean powOf2 = isPowOf2(capacity);
>  > > >
>  > > > final int mod(int index, int length) {
>  > > >     return powOf2? index & (length - 1) : index % length;
>  > > > }
>  > > > <code>
>  > > >
>  > > > In theory the common path for the workload should be specialized by
>  > JIT.
>  > > >
>  > > > Ooops. When looking into HashMap code I've noticed someone already did
>  > > > this.
>  > > > https://issues.apache.org/jira/browse/HARMONY-4064
>  > > > HashMap effectively does the rounding to the neares 2^k.
>  > > >
>  > > > Thanks.
>  > > > Sergey.
>  > > > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <ndbeyer@apache.org<https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
>  > > > wrote:
>  > > >
>  > > > > I know that people will notice; I have personal experience with
>  > > systems
>  > > > > that
>  > > > > included custom Map implementations were written because HashMaps
>  > grew
>  > > > too
>  > > > > large for small data sets (less than 2000, actually) and wasted
a
>  > lot
>  > > of
>  > > > > memory for the unnecessary capacity. Even the use of the capacity
>  > and
>  > > > load
>  > > > > factor didn't provide enough compensation in these cases.
>  > > > >
>  > > > > -Nathan
>  > > > >
>  > > > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
>  > > > > sergey.i.salishev@gmail.com<https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com
>  > >>
>  > >  > wrote:
>  > > > >
>  > > > > > Nathan,
>  > > > > >
>  > > > > > I don't think anyone will notice the hash map size rounding.
It
>  > can
>  > > > lead
>  > > > > > to
>  > > > > > some memory overhead in very rare cases the user creates hash
map
>  > > with
>  > > > > the
>  > > > > > exact size. But in the most common case where the map is created
>  > > with
>  > > > > > default size the rounding will not change the behavior at all
as
>  > > it's
>  > > > in
>  > > > > > agreement with the standard 2x growth policy. On the other
hand
>  > size
>  > > > > > rounding gives substantial performance boost on all gets.
>  > > > > >
>  > > > > > Thanks.
>  > > > > > Sergey.
>  > > > > >
>  > > > > >
>  > > > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <ndbeyer@apache.org<https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
>  > > > <
>  > > > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
>  > > > > > wrote:
>  > > > > >
>  > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
>  > > > > > >
>  > > > > > > Again, I don't agree with the capacity rounding in the
patch
>  > > > attached
>  > > > > to
>  > > > > > > this issue. I do like the change to the internal data
structure;
>  > > use
>  > > > > two
>  > > > > > > arrays for key/value instead a single array. It makes
the code
>  > > > easier
>  > > > > to
>  > > > > > > read.
>  > > > > > >
>  > > > > > > -Nathan
>  > > > > > >
>  > > > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
>  > > > > > > aleksey.shipilev@gmail.com<https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
>  > > > <
>  > > > >
>  > > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
>  > > > >>
>  > > > >  > wrote:
>  > > > > > >
>  > > > > > >  > Colleagues,
>  > > > > > > >
>  > > > > > > > I had recently filed two JIRAs with improvements
in
>  > Collections,
>  > > > > > > > giving up to +30-40% to serialization benchmarks.
Presumably
>  > > they
>  > > > > will
>  > > > > > > > boost the performance across the all users since
the
>  > > optimization
>  > > > is
>  > > > > > > > pretty general:
>  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
>  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
>  > > > > > > >
>  > > > > > > > Would some classlib guru (Tim, Nathan, Tony?) review
and
>  > commit
>  > > > > them?
>  > > > > > > >
>  > > > > > > > Thanks,
>  > > > > > > > Aleksey.
>  > > > > > > >
>  > > > > > >
>  > > > > >
>  > > > >
>  > > >
>  > >
>  >
>

Mime
View raw message