commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael A. Smith" <>
Subject RE: [Commons-Avalon:collections] code integration
Date Thu, 20 Jun 2002 17:27:52 GMT
On Thu, 20 Jun 2002, Berin Loritsch wrote:
> > From: Michael A. Smith [] 
> > 
> > no double-checked-locking-esque problems...
> > 
> > rather than having each bucket a linked list, make each bucket a 
> > resizable hashtable.  The number of buckets changes from some 
> > non-power 
> > of two to be a power of two. 
> I would like to see a performance testcase on that.
> I word of warning about the power of two issue.
> Because the default behavior for a java.lang.Object's hashCode()
> method is to return its memory address, or memory index (can't
> remember which) the hashCode() manipulator that is in BucketMap
> will do a modulus to lower the hashCode to fit within one of
> the buckets.
> The concept is rather simplistic I realize, but one observed
> pattern during testing is that if the number of buckets were
> a power of two then the put() operations would place the
> new entry in one of two locations.  As a result the hashcode
> operation was flawed.

Good point.  Although I think that's more of a deficiency of the default 
hashCode method than it is with BucketMap.  There's no reason that the 
mask needs to be the least significant bits of the hashcode though.  you 
could right shift a few bits before applying the mask and get a better 

> By forcing the size of the BucketMap to be an odd number (and
> consequently never a power of two) I was able to spread the entries
> across all the buckets with minimal effort.
> The performance increase of doing so was an order of magnitude.
> Consider having 100 threads all contending for the same map.
> If all the entries were in two of the buckets, we not only lost
> the benefit of fine grained locking, but we lost the benefit of
> hashing.
> With an odd number of buckets, the typical multiple of 8 hashCode
> (on 32 bit machines) was spread more evenly.  Thus gaining both
> enhancements.

the odd-numbered sized bucked would still be used for the sub-buckets, 
so even if the low order bits were multiples of 8, it shouldn't matter.

> > Compute the hashcode of the key, and with an appropriate mask 
> > (i.e. the fixed size - 1) to determine the proper bucket.  
> > synchronize on the bucket.  Mod the computed hash code by 
> > that particular buckets current 
> > sub-bucket size to find the linked list that should contain the key.  
> > iterate the linked list to find the element.  When adding, determine 
> > current load on the bucket's hashtable (probably 
> > pre-computed/cached for 
> > with each add), and determine whether expansion is necessary.  This 
> > expansion expands the one top-level bucket.  
> I hope you realize that a hashtable is synchronized...

I'm not sure what you mean here.

> A hashcode really shouldn't change once the entry is placed in the
> map.  Otherwise what will happen is that either the lower buckets
> will have an abundance of entries, or you have the cost of rehashing
> all the entries.

Yes, you'd have the cost of rehashing all the entries.  This cost can be 
amortized over all the adds necessary to force the resizing (I mentioned 
that in my original email on the additional costs required for 
implementing the resizing behavior).  

> If the number of buckets changes, the hash calculation on the incomming
> key will change the value--very much likely pointing to the wrong
> bucket.

no.  The hashcode doesn't change.  just which bucket the hashcode maps 

> Of course I could be misunderstanding what you are getting at as
> well.

yeah, I wish I had time to code it up and show it to you in action.  
I've talked about the implementation numerous times (mostly in 
interviews believe it or not), but haven't actually sat down to 
implement it.  You'll understand perfectly if I showed you code.  :)

> > > I think BucketMap is optimal for the case when you know 
> > upfront that 
> > > you have a maximum number of possible mappings at any time.
> > 
> > The other option is to make sure that this is a well documented 
> > limitation.  We could always add a DynamicBucketMap that adds the 
> > expansion capabilities at the cost I described before.
> That would probably be the best option.  It fits with the two
> Buffer classes I wrote.  The FixedSizeBuffer is used when you
> know up front how many entries will ever be in the buffer.
> The VariableSizeBuffer will dynamically resize itself when
> needed.

The more I think about it, the more I think I like this idea of a
(Dynamic|Variable)BucketMap to augment the BucketMap rather than replace


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message