Joe Orton writes:
> RSE has done the maths on why *33 is a magic string hashing function...
> this from his "str" library in str_hash.c:
Actually, the same huge comment also appears right above the hash
computation in the current apr_hash.c.
(So the patch to change that computation should change or delete the
comment too, not that it matters at the moment since said patch hasn't
been applied.)
-K
> /*
> ** Str - String Library
> ** Copyright (c) 1999-2000 Ralf S. Engelschall
> **
> ** This file is part of Str, a string handling and manipulation
> ** library which can be found at http://www.engelschall.com/sw/str/.
> **
> ** Permission to use, copy, modify, and distribute this software for
> ** any purpose with or without fee is hereby granted, provided that
> ** the above copyright notice and this permission notice appear in all
> ** copies.
> **
> ** THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
> ** WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> ** MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> ** IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR
> ** CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> ** SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> ** LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> ** USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> ** ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> ** OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> ** OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> ** SUCH DAMAGE.
> **
> ** str_hash.c: hashing functions
> */
>
> #include "str_p.h"
>
> /*
> * DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
> *
> * This is Daniel J. Bernstein's popular `times 33' hash function as
> * posted by him years ago on comp.lang.c. It basically uses a function
> * like ``hash(i) = hash(i-1) * 33 + string[i]''. This is one of the
> * best hashing functions for strings. Because it is both computed very
> * fast and distributes very well.
> *
> * The magic of the number 33, i.e. why it works better than many other
> * constants, prime or not, has never been adequately explained by
> * anyone. So I try an own RSE-explanation: if one experimentally tests
> * all multipliers between 1 and 256 (as I did it) one detects that
> * even numbers are not useable at all. The remaining 128 odd numbers
> * (except for the number 1) work more or less all equally well. They
> * all distribute in an acceptable way and this way fill a hash table
> * with an average percent of approx. 86%.
> *
> * If one compares the Chi/2 values resulting of the various
> * multipliers, the 33 not even has the best value. But the 33 and a
> * few other equally good values like 17, 31, 63, 127 and 129 have
> * nevertheless a great advantage over the remaining values in the large
> * set of possible multipliers: their multiply operation can be replaced
> * by a faster operation based on just one bit-wise shift plus either a
> * single addition or subtraction operation. And because a hash function
> * has to both distribute good and has to be very fast to compute, those
> * few values should be preferred and seems to be also the reason why
> * Daniel J. Bernstein also preferred it.
> */
>
> ...