apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Darroch <chr...@pearsoncmg.com>
Subject Re: Hash collision vectors in APR?
Date Tue, 10 Jan 2012 23:29:16 GMT
Bojan Smojver wrote:

> On Fri, 2012-01-06 at 10:52 +1100, Bojan Smojver wrote:
>> +    if (apr_generate_random_bytes(
>> +            (unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS)
>> +        return NULL;
> Chris,
> What I mean is, instead of return NULL, we could have ht->seed =
> (unsigned int) ht. Not as random as /dev/urandom, but probably better
> than zero.

   Yes, probably ... I hesitate because I'm not a security expert and
I suppose there could be ways someone might know (a) you don't have
a working apr_generate_random_bytes() and (b) the address of ht struct.

   The patch might also need to account for the !APR_HAS_RANDOM case,
now that I look again.

   The other question, I suppose, is whether there's a hit to performance
from apr_generate_random_bytes() call.  Without having tested explicitly,
it looks like the default case for modern Linux is APR_HAS_RANDOM=1 and
DEV_RANDOM=/dev/random, with /dev/random blocking when there's no entropy.
That might, I think, cause apr_hash_make() to block unexpectedly for

   The main thing is just that apr_hash_make() really can't return NULL,
and probably shouldn't start blocking unexpectedly, either.  Alas,
too many things assume it succeeds, and succeeds quickly.

   A quick study of what the Perl folks did since 2003, after being
called out on the hash-collision-prediction issue, suggests they
use a quickly generated seed value -- see Perl_seed() in util.c,
which feeds Perl_get_hash_seed(), which is used to set the PL_rehash_seed
value in perl.c and then used in each hash calculation in
PERL_HASH_INTERNAL() in hv.h.  That's sort of all beside the point, but
it's interesting to see the stripped-down seed function, which seems
to boil down to (on most platforms):

#define   SEED_C1       1000003
#define   SEED_C2       3
#define   SEED_C3       269
#define   SEED_C4       73819
#define   SEED_C5       26107

Time_t when;
U32 u;

u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec;
u += SEED_C3 * (U32)PerlProc_getpid();
u += SEED_C4 * (U32)PTR2UV(PL_stack_sp);
u += SEED_C5 * (U32)PTR2UV(&when);
return u;

   So there's a timestamp, a process ID, and two pointers converted
to integers, "spread out" by the various constant multiplier factors.
The code comments suggest this is not especially finely-tuned logic.
See http://perl5.git.perl.org/ for the source.  I don't know if any
of that is useful for APR, but there it is.

   Reading through Ben Laurie's paper on the apr_random_* functions
(http://www.apache-ssl.org/randomness.pdf) and the code I'd like to
say I fully grokked how to use it and whether it's sufficient, but
that would be untrue.  :-)  It seems like the only the process ID is
used, perhaps, unless one mixes in one's own entropy?

   So I'm less than certain what to suggest here ... maybe someone
with greater confidence in the subject can advise on whether a
"fast, simple" seed like the Perl one would be considered reasonable
for this particular problem, namely, making it harder to guess hash
collisions in a context where a large set of externally-provided
values are being used as keys.


GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9

View raw message