cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carsten Ziegeler" <>
Subject RE: [RT]: Calculating the cache key
Date Fri, 24 May 2002 07:30:22 GMT

Geoff Howard wrote:
> Yes, sorry looking back my comment wasn't clear.  What I was
> catching on to
> was that each element of the array stored an integer which seems to be
> storing two different values - one in its 2 most significant bytes for the
> component, one in its two least significant bytes for the unique
> key within
> that component space - in your example a filename.
> So, in the example above, the integer stored in p[1] is
> 0xfb021431 and is an
> XSLTTransformer key.  The integer stored in p[2] is 0xfb02542e, also an
> XSLTTransformer key, but with a different resource.  If I understood you
> right,
> 	p[1]>>>16 = p[2]>>>16 = 0x0000fb02
> which is a unique hash meaning "XSLTTransformer" component. And
> 	p[1] & 0xFFFF = 0x00001431
> which is a unique hash meaning "layout.xsl".

The longer I think about the key generation, I feel that we cannot use
hashed values at least as long as we use "long" for the hash value.

Why? The answer is simple, a long has 2^64 possibilities. Often the
information that is hashed is a file name or a URI. Now, if we
only count lower case characters for the file name, we have 26
possibilities for each position, so there are 26^[length of information]
possible values and this is very quick much more possibilities
than the hashed value offers. So there must be collisions!
And a cache based on possible collisions is not usable.

To come to a conlusion I did a simple performance test:
- Creating 10000 random keys (strings) where each character is a value
between 0 and 9
- Hashing the 10000 values
- Putting the strings into the cache (a HashMap)
- Putting the Hashed values into another cache (a HashMap)
- Searching for all values in the string cache
- Searching for all values in the hashed cache (this requires of course
rehashing the values)

And this is the result:
Start key generation
Created 10000 strings with average length 169
Start hashing
Hashed: 120 ms
Creating cache with Strings
Created: 60 ms
Creating cache with longs
Created: 30 ms
Testing strings
Strings: 10 ms
Testing hashed strings
Hashed strings: 131 ms

So, looking up the hashed information is slower as the values have to be
rehashed. But
putting strings into the cache is slower.
>From the test above it seems that using strings is still a little bit faster
using hashed values.
Or did I oversee something?
Could we extend this test in any way?


To unsubscribe, e-mail:
For additional commands, email:

View raw message