apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Fogel <kfo...@galois.collab.net>
Subject Re: Hash?
Date Tue, 02 Jan 2001 16:37:31 GMT
Joe Orton <joe@light.plus.com> writes:
> RSE has done the maths on why *33 is a magic string hashing function...
> this from his "str" library in str_hash.c:

Actually, the same huge comment also appears right above the hash
computation in the current apr_hash.c.

(So the patch to change that computation should change or delete the
comment too, not that it matters at the moment since said patch hasn't
been applied.)

-K

> /*
> **  Str - String Library
> **  Copyright (c) 1999-2000 Ralf S. Engelschall <rse@engelschall.com>
> **
> **  This file is part of Str, a string handling and manipulation 
> **  library which can be found at http://www.engelschall.com/sw/str/.
> **
> **  Permission to use, copy, modify, and distribute this software for
> **  any purpose with or without fee is hereby granted, provided that
> **  the above copyright notice and this permission notice appear in all
> **  copies.
> **
> **  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
> **  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> **  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> **  IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR
> **  CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> **  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> **  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> **  USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> **  ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> **  OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> **  OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> **  SUCH DAMAGE.
> **
> **  str_hash.c: hashing functions 
> */
> 
> #include "str_p.h"
> 
> /*
>  * DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
>  *
>  * This is Daniel J. Bernstein's popular `times 33' hash function as
>  * posted by him years ago on comp.lang.c. It basically uses a function
>  * like ``hash(i) = hash(i-1) * 33 + string[i]''. This is one of the
>  * best hashing functions for strings. Because it is both computed very
>  * fast and distributes very well.
>  *
>  * The magic of the number 33, i.e. why it works better than many other
>  * constants, prime or not, has never been adequately explained by
>  * anyone. So I try an own RSE-explanation: if one experimentally tests
>  * all multipliers between 1 and 256 (as I did it) one detects that
>  * even numbers are not useable at all. The remaining 128 odd numbers
>  * (except for the number 1) work more or less all equally well. They
>  * all distribute in an acceptable way and this way fill a hash table
>  * with an average percent of approx. 86%. 
>  *
>  * If one compares the Chi/2 values resulting of the various
>  * multipliers, the 33 not even has the best value. But the 33 and a
>  * few other equally good values like 17, 31, 63, 127 and 129 have
>  * nevertheless a great advantage over the remaining values in the large
>  * set of possible multipliers: their multiply operation can be replaced
>  * by a faster operation based on just one bit-wise shift plus either a
>  * single addition or subtraction operation. And because a hash function
>  * has to both distribute good and has to be very fast to compute, those
>  * few values should be preferred and seems to be also the reason why
>  * Daniel J. Bernstein also preferred it.
>  */
> 
> ...

Mime
View raw message