Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
From: "Mladen Turk" <mturk@mappingsoft.com>
To: <dev@httpd.apache.org>
Cc: "Brian Pane" <bpane@pacbell.net>
Subject: RE: [PATCH] apr_table WAS: What to do about tables?
Date: Fri, 9 Nov 2001 09:05:57 -0800
Message-ID: <000e01c16940$d09873c0$5f00000a@armada>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
In-Reply-To: <3BEB6BC0.5060005@pacbell.net>
Importance: Normal


> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: 8. studeni 2001 21:38
> To: dev@httpd.apache.org
> Subject: Re: [PATCH] apr_table WAS: What to do about tables?
> 
> Mladen Turk wrote:
> 
> [...]
> 
> >I've treated the apr_table as database table, having columns and
rows.
> >When you think of apr_table that way, you can use the same technique
> >that the databases use to quickly access particular row, using some
sort
> >of indexing.
> >
> Also, why add another hash table implementation?  Apr_hash_t won't
> quite work here, because it doesn't have efficient support for
> case-insensitive string keys, but I'd rather add that support
> into apr_hash_t than add another version of hash tables to APR.
>
The current apr hash table is inappropriate for such a design and too
much overhead. It stores the key/value data pairs by itself.
If you look deeply in my code it's not a true hash table, but rather an
array of indexes that are accessed using hashing scheme.

> Based on my really quick reading of your new version of apr_tables.c,
> the code looks reasonable.  My main worry is that the overhead of
> the hash table maintenance, including the function call overhead and
> the need to reindex every time the table is expanded, may offset the
> performance benefits of hashing.  Do you have any benchmark data
> to compare the performance of this implementation with the current
> apr_table_t?

				old apr_table	my implementation
(microseconds)

add    10000 keys		   40.057			 80.113
get    10000 keys       7.220.166	      	 60.084

The keys are in the form KEY0...KEY9999

So I agree that you have almost double time to insert the keys, but
getting values from table shows 120 times performance gain.

There are number of ways to further optimize my code, for example, the
simple table scan would beet any indexing technique for a small number
of items. Other would be to optimize reindexing, or expressively do an
indexing after substantial table update (for example after reading
headers or mime types).

MT.