apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Foad <julianf...@btopenworld.com>
Subject [PATCH] APR hash table memory use
Date Tue, 02 Nov 2004 23:22:24 GMT
The Subversion project encountered a usage pattern of APR hash tables that 
caused many megabytes of memory to be occupied by a small hash table after a 
sequence of many thousands of (insert, delete) pairs.  They have subsequently 
changed the usage pattern to avoid this.

The cause of the problem is that every time apr_hash_set(..., non-NULL) adds a 
new entry, it allocates some new pool memory for that entry and links it into 
one of its linked lists.  When apr_hash_set(..., NULL) deletes an entry, it 
unlinks it but never re-uses that memory.

So this is an inefficiency in APR's implementation of hash tables.  It is not 
exactly a "leak", since it is contained with a pool, but it is unbounded memory 
use caused by repeated simple actions - not a good property for a container to 
have.

It would be possible to fix this in APR if people thought it worthwhile, e.g. 
by using a "free-list".  (Maintain, in the hash table, a list of free entries, 
initially empty.  When deleting an entry, move it to the free-list.  When a new 
entry is wanted, take one from the free-list if available, else allocate it 
from the pool.)  This would make the memory used by a hash table proportional 
to the maximum number of entries that it had ever held simultaneously, rather 
than the number of insertions that had ever been performed.

In fact, using a free-list seems to be quite easy, with virtually no speed 
penalty.  A patch is attached, prepared in the manner of a Subversion patch.

(Please copy replies to dev@subversion.tigris.org, as I am not subscribed to 
dev@apr.apache.org)

- Julian


Mime
View raw message