rse 00/07/14 12:55:50
Modified: src/lib/apr/lib apr_hash.c
Log:
Clarify the rumor around this stuff a little bit more by cut & pasting a
little bit of explanation I wrote in the source of one of my forthcoming
libraries.
In short: The trick of the multiplier 33 is not that it is magic, it is
more because it can be easily replaced with a shift+add function. So,
let's actually do it here or the whole trick would be null.
Revision Changes Path
1.8 +31 4 apache2.0/src/lib/apr/lib/apr_hash.c
Index: apr_hash.c
===================================================================
RCS file: /home/cvs/apache2.0/src/lib/apr/lib/apr_hash.c,v
retrieving revision 1.7
retrieving revision 1.8
diff u r1.7 r1.8
 apr_hash.c 2000/07/13 20:58:47 1.7
+++ apr_hash.c 2000/07/14 19:55:50 1.8
@@ 219,13 +219,40 @@
klen = strlen(key) + 1;
/*
 * This hash function is used by perl 5; RSE attributes it to DJB.
 * (See MessageID: <19991013131827.A17702@engelschall.com>
 * in the newhttpd archive for October 1999.)
+ * This is Daniel J. Bernstein's popular `times 33' hash function
+ * as posted by him years ago on comp.lang.c. It basically uses a
+ * function like ``hash(i) = hash(i1) * 33 + str[i]''. This is one
+ * of the best known hash functions for strings. Because it is both
+ * computed very fast and distributes very well.
+ *
+ * The magic of number 33, i.e. why it works better than many other
+ * constants, prime or not, has never been adequately explained by
+ * anyone. So I try an explanation: if one experimentally tests all
+ * multipliers between 1 and 256 (as I did while writing a lowlevel
+ * data structure library some time ago) one detects that even
+ * numbers are not useable at all. The remaining 128 odd numbers
+ * (except for the number 1) work more or less all equally well.
+ * They all distribute in an acceptable way and this way fill a hash
+ * table with an average percent of approx. 86%.
+ *
+ * If one compares the Chi/2 values of the variants (see
+ * Bob Jenkins ``Hashing Frequently Asked Questions'' at
+ * http://burtleburtle.net/bob/hash/hashfaq.html for a description
+ * of Chi/2), the number 33 not even has the best value. But the
+ * number 33 and a few other equally good numbers like 17, 31, 63,
+ * 127 and 129 have nevertheless a great advantage to the remaining
+ * numbers in the large set of possible multipliers: their multiply
+ * operation can be replaced by a faster operation based on just one
+ * shift plus either a single addition or subtraction operation. And
+ * because a hash function has to both distribute good _and_ has to
+ * be very fast to compute, those few numbers should be preferred
+ * and seems to be the reason why Daniel J. Bernstein also preferred
+ * it.
+ *  Ralf S. Engelschall <rse@engelschall.com>
*/
hash = 0;
for (p = key, i = klen; i; i, p++)
 hash = hash * 33 + *p;
+ hash = ((hash << 5) + hash) + *p;
/* scan linked list */
for (hep = &ht>array[hash & ht>max], he = *hep;
