harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: [classlib] Some further study on hashcode and HashMap
Date Wed, 04 Jul 2007 14:20:59 GMT
Xiao-Feng Li wrote:
> On 7/4/07, Tim Ellison <t.p.ellison@gmail.com> wrote:
>> Xiao-Feng Li wrote:
>> <big snip>
>> I just wanted to pick up on this...
>> > (Note if we use a formula for hashing, it will be computed every time
>> > it's accessed, we need consider the overhead of the computation.
>> Not each time it is accessed, just when it is first stored, right?
> Tim, This is one of the design decisions made for DRLVM hashcode
> implementation. Based on a common assumption that hashcode is not
> frequently accessed (only a small ratio of objects are accessed for
> hashcode, and only small ratio of applications use hashcode),  we
> usually don't want a constant storage for hashcode. Instead, we only
> allocate the extra storage when the hashcode is accessed AND the
> object is moved; Otherwise, we directly return the object address for
> hashcode. Since most objects die before they are moved in a collection
> (this is another common assumption we make for objects life spans),
> most hashcodes will be accessed before they are stored.

Ah, ok - thanks.  Hopefully we have not tipped the balance too far if we
require DRLVM to perform some computation on the address before
returning it.

> So in this design of hashcode, we have some assumptions and make perf
> optimizations based on them. It's unclear if the design is the best in
> general, since common workloads don't have extensive hashcode
> accesses, hence cannot really exhibit the design differences. (Btw,
> Dacapo has some apps using hashcode heavily. )

Agreed, we will probably only notice these things as we scale up to
large numbers of entries in a hashed collection.

There are other algorithmic things we can do to improve HashMap too if
we are prepared to bet on 'typical usage'.  For example, new entries are
always added to the head of the linked list for a given bucket.  If we
add them in their natural order (so the link list is ordered by hashcode
value), then in general you would only have to search half the length of
the link list to determine that the key does not exist.  It would be a
win with get's for absent keys, but a loss if there are more put's than


View raw message