cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berin Loritsch" <blorit...@apache.org>
Subject RE: Antwort: RE: [RT]: Calculating the cache key
Date Wed, 29 May 2002 13:00:44 GMT
> From: volker.schmitt@basf-it-services.com 
> 
> Berlin Loritsch wrote:
> 
> <snip>
> >There are two things you need to know about the default 
> hashCode() for 
> >this to work:
> >
> >1) Default hash values are the address of the object--meaning that
> >   they are all aligned on a power of 2 ( typically every 4 or 8
> >   bytes depending on 32 or 64 bit machines ).
> >
> >2) Very regular hash values (like the default) will heavily weight
> >   themselves to a particular buckets in the hash table (I found this
> >   out creating the new BucketMap in Avalon collections).
> 
> >3) You must ensure that the resultant hashCode is not an even number,
> >   this will help ensure a more even distribution of 
> hashvalues.  (this
> >   is what the String hashCode tries to do).
> >
> >4) The hashing algorithm must be quick--but it can be cashed for a
> >   quick access.  That way if multiple tests on the hashCode occur,
> >   you can save an expensive recalculation phase for each code.
> 
> sorry, but I don't understand why the resultant hashCode 
> should not an even number. My understanding of a hashCode 
> funktion is, that it should generate homogeneous values.


Theoretically yes.  However it is mathematically difficult to devise a
well performing maximally disperse algorithm based on a known value.
Especially when that value series has a very regular pattern.

By forcing an odd value, we take advantage of simple division to ensure
that our entries are more evenly spread in a hashmap.  A typical
division
of a Hashmap maps the hashvalues to an array of "buckets"  Each "bucket"
is a linked list of entries with equivalent hashcodes.

It is impractical to have several hashmaps with as many entries as would
fill all of your memory (2^64 entries would have an entry filling every
address of memory).  Therefore, hashmaps have a smaller set of entries
and
map the hash entries to the smaller array.  The mapping is usually done
with a simple mod operation.

e = h mod n

Where
e is the entry in the hash
h is the hashcode
n is the number of entries in the hash

Lets say that the hashmap has 32 buckets.  By experimentation, and then
by calculation, I found that 2 of the 32 buckets were used.  The split
between the two was fairly even, however 30 buckets were not touched.

When we are looking for an entry the hash gets us to the bucket, and
then
we have to iterate through the entries until we find the actual value we
want.

By changing either the hash value to be an odd number, the number of
buckets to be an odd number, or both we affectively spread our results
across the smaller hash values more evenly.

So in practice you either want all your hash values to be odd, or you
want
to make sure there is an equal oportunity for even and odd numbers.  You
need to make sure that dispersion is fairly even....

Perhaps a hashCode testcase would be beneficial: one that would first
test for the percentage of odd/even, and then one that would test the
dispersion of hash entries based on a smaller set of hash values.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message