commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berin Loritsch" <blorit...@apache.org>
Subject RE: [Commons-Avalon:collections] code integration
Date Thu, 20 Jun 2002 16:44:01 GMT
> From: Michael A. Smith [mailto:mas@apache.org] 
> 
> no double-checked-locking-esque problems...
> 
> rather than having each bucket a linked list, make each bucket a 
> resizable hashtable.  The number of buckets changes from some 
> non-power 
> of two to be a power of two. 

I would like to see a performance testcase on that.

I word of warning about the power of two issue.

Because the default behavior for a java.lang.Object's hashCode()
method is to return its memory address, or memory index (can't
remember which) the hashCode() manipulator that is in BucketMap
will do a modulus to lower the hashCode to fit within one of
the buckets.

The concept is rather simplistic I realize, but one observed
pattern during testing is that if the number of buckets were
a power of two then the put() operations would place the
new entry in one of two locations.  As a result the hashcode
operation was flawed.

By forcing the size of the BucketMap to be an odd number (and
consequently never a power of two) I was able to spread the entries
across all the buckets with minimal effort.

The performance increase of doing so was an order of magnitude.

Consider having 100 threads all contending for the same map.
If all the entries were in two of the buckets, we not only lost
the benefit of fine grained locking, but we lost the benefit of
hashing.

With an odd number of buckets, the typical multiple of 8 hashCode
(on 32 bit machines) was spread more evenly.  Thus gaining both
enhancements.



> Compute the hashcode of the key, and with an appropriate mask 
> (i.e. the fixed size - 1) to determine the proper bucket.  
> synchronize on the bucket.  Mod the computed hash code by 
> that particular buckets current 
> sub-bucket size to find the linked list that should contain the key.  
> iterate the linked list to find the element.  When adding, determine 
> current load on the bucket's hashtable (probably 
> pre-computed/cached for 
> with each add), and determine whether expansion is necessary.  This 
> expansion expands the one top-level bucket.  

I hope you realize that a hashtable is synchronized...

A hashcode really shouldn't change once the entry is placed in the
map.  Otherwise what will happen is that either the lower buckets
will have an abundance of entries, or you have the cost of rehashing
all the entries.

If the number of buckets changes, the hash calculation on the incomming
key will change the value--very much likely pointing to the wrong
bucket.

Of course I could be misunderstanding what you are getting at as
well.

> > I think BucketMap is optimal for the case when you know 
> upfront that 
> > you have a maximum number of possible mappings at any time.
> 
> The other option is to make sure that this is a well documented 
> limitation.  We could always add a DynamicBucketMap that adds the 
> expansion capabilities at the cost I described before.

That would probably be the best option.  It fits with the two
Buffer classes I wrote.  The FixedSizeBuffer is used when you
know up front how many entries will ever be in the buffer.
The VariableSizeBuffer will dynamically resize itself when
needed.

Both are basically a circular buffer implementation, and they
can benefit from the optimizations available if they do not
implement the Container interface.  They are used in my Event
package which is also fairly low-level.

The Event package is inspired by Matt Welsh's SEDA architecture.
They have a cleaned up API, and handle event queues rather well.
The Event package (and the MPool package) is not as low level
as the Collections/Concurrent packages, but are very effective.

However, those are another topic entirely...


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message