apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <...@cs.berkeley.edu>
Subject Re: apr_hash: Memory consumption
Date Tue, 15 Sep 2009 20:31:08 GMT
On Mon, Sep 14, 2009 at 7:43 PM, Bing Swen <bswen@pku.edu.cn> wrote:
> We'd love to see such a patch. we've been hampered by this problem for years.

Attached is a patch that implements the scheme I suggested earlier
(allocating the bucket array in a subpool). The patch is against the
1.3 branch.

Unfortunately, the APR hash table API doesn't provide any means to
report errors, so we can't easily propagate apr_pool_create() failures
back to the client program. Presuming changing the APR hash API isn't
an option for 1.3.x / 1.4.x, is there a better solution than just
ignoring apr_pool_create() failures?

I've also attached a trivial benchmark program that performs 10
million insertions. Vanilla APR 1.3 branch (r813588):

./testhashperf  0.42s user 0.10s system 96% cpu 0.537 total
./testhashperf  0.42s user 0.09s system 97% cpu 0.526 total
./testhashperf  0.42s user 0.09s system 97% cpu 0.527 total
./testhashperf  0.42s user 0.09s system 97% cpu 0.526 total
./testhashperf  0.42s user 0.10s system 98% cpu 0.526 total
./testhashperf  0.42s user 0.10s system 98% cpu 0.527 total

With the patch applied:

./testhashperf  0.39s user 0.10s system 97% cpu 0.501 total
./testhashperf  0.39s user 0.10s system 98% cpu 0.501 total
./testhashperf  0.39s user 0.09s system 97% cpu 0.502 total
./testhashperf  0.39s user 0.10s system 97% cpu 0.501 total
./testhashperf  0.39s user 0.10s system 98% cpu 0.501 total
./testhashperf  0.39s user 0.10s system 97% cpu 0.501 total

(OSX 10.6, gcc 4.2.1, "-O2")

> The easiest remedy we've been using is to add a new "API" function like this:
>
> APR_DECLARE(apr_hash_t *) apr_hash_make_ex(apr_pool_t *pool, unsigned int init_size)

Yeah, I added something similar to my local copy in the meanwhile.
When the hash table size is predictable, this avoids the excess memory
consumption, but can also improve performance significantly: I've
found expand_array() to be quite expensive when a hash table grows
from empty to a few million entries. Such an API might be worth
considering for APR 1.4 or 2.0.

> Also, we'd like very much to see the possiblity whether apr_hash_t could be extended
to make use of the "move-to-front" technique that is presented by the following papers:
>
>        "In-memory hash tables for accumulating text vocabularies", J. Zobel, S. Heinz,
and H.E. Williams, Information Processing Letters, 80(6):271-277, December 2001. (http://www.cs.rmit.edu.au/~jz/fulltext/ipl01.pdf)
>
>        "Cache-conscious collision resolution in string hash tables", N. Askitis and
J. Zobel, Proceedings of the SPIRE String Processing and Information Retrieval Symposium,
November 2005, pp. 91-102. LNCS 3772.  (http://www.cs.rmit.edu.au/~jz/fulltext/spire05.pdf)

Seems like it would be easy to do, but I wonder if it is worth
including in APR: this technique is only sensible when the hash
workload has significant temporal locality.

Neil

Mime
View raw message