Return-Path: Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 36978 invoked by uid 500); 15 Nov 2001 01:52:22 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 36967 invoked from network); 15 Nov 2001 01:52:21 -0000 Message-ID: <3BF31FDB.7080506@cnet.com> Date: Wed, 14 Nov 2001 17:52:27 -0800 From: Brian Pane User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.5) Gecko/20011011 X-Accept-Language: en-us MIME-Version: 1.0 To: dev@apr.apache.org Subject: Re: What to do about tables? References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N dean gaudet wrote: [...] >you can always have an elt array and then use some other method to index >into the elt array... > >but that sucks :) > Agreed. What I've found so far is that it's difficult to actually improve the speed of tables if you have to update both the elt array and an additional index. Thus the approach that I'm currently studying is a more modest change: * Keep the linear array of elements, and the O(n)-time lookups * Add a checksum to each element. When doing an O(n) scan, skip the strcasecmp on each element if the checksum stored in the element doesn't match the checksum of the key for which you're searching. This is a win if and only if computing the checksum is extremely fast. What I'm working with now is basically: /* compute an apr_uint32_t checksum... */ #define COMPUTE_CHECKSUM(key, cksum) \ { \ const char *k = key; \ cksum = (*k++ & 0xdf); \ cksum <<= 8; \ if (*k) { \ cksum |= (*k & 0xdf); \ k++; \ } \ cksum <<= 8; \ if (*k) { \ cksum |= (*k & 0xdf); \ k++; \ } \ cksum <<= 8; \ if (*k) { \ cksum |= (*k & 0xdf); \ k++; \ } \ } This basically packs the first 4 bytes of a string into an int in case-normalized form, so that we can do the equivalent of a partial strcasecmp with a single integer comparison--and skip the real strcasecmp if this comparison fails. (And yes, the "0xdf" needs to be #defined because EBCDIC requires a different mask.) I'll post a patch if the profiling data shows an improvement with this code. --Brian