Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <3BF31FDB.7080506@cnet.com>
Date: Wed, 14 Nov 2001 17:52:27 -0800
From: Brian Pane <brian.pane@cnet.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:0.9.5) Gecko/20011011
MIME-Version: 1.0
To: dev@apr.apache.org
Subject: Re: What to do about tables?
References: <Pine.LNX.4.33.0111141703360.6452-100000@twinlark.arctic.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

dean gaudet wrote:
[...]

>you can always have an elt array and then use some other method to index
>into the elt array...
>
>but that sucks :)
>

Agreed. What I've found so far is that it's difficult to actually
improve the speed of tables if you have to update both the elt
array and an additional index.

Thus the approach that I'm currently studying is a more modest
change:

  * Keep the linear array of elements, and the O(n)-time lookups
  * Add a checksum to each element.  When doing an O(n) scan,
    skip the strcasecmp on each element if the checksum stored
    in the element doesn't match the checksum of the key for which
    you're searching.

This is a win if and only if computing the checksum is extremely
fast.  What I'm working with now is basically:

/* compute an apr_uint32_t checksum... */
#define COMPUTE_CHECKSUM(key, cksum) \
{  \
    const char *k = key; \
    cksum = (*k++ & 0xdf); \
    cksum <<= 8; \
    if (*k) { \
        cksum |= (*k & 0xdf); \
        k++; \
    } \
    cksum <<= 8; \
    if (*k) { \
        cksum |= (*k & 0xdf); \
        k++; \
    } \
    cksum <<= 8; \
    if (*k) { \
        cksum |= (*k & 0xdf); \
        k++; \
    } \
}
This basically packs the first 4 bytes of a string into
an int in case-normalized form, so that we can do the
equivalent of a partial strcasecmp with a single integer
comparison--and skip the real strcasecmp if this comparison
fails.

(And yes, the "0xdf" needs to be #defined because EBCDIC
requires a different mask.)

I'll post a patch if the profiling data shows an improvement
with this code.

--Brian