httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject DBM stuff (was: cvs commit: apache-2.0 STATUS)
Date Thu, 20 Apr 2000 04:19:56 GMT
On Wed, 19 Apr 2000, Tony Finch wrote:
> Greg Stein <> wrote:
> >For a hash table implementation, I would recommend extracting the relevant
> >code from Python.
> Algorithms are easy to come by (but most of the implementations that I
> have seen recently are shite -- perl's and python's would be OK), but
> for an implementation that fits in well with the rest of Apache (or
> rather APR) it would have to be basically rewritten from scratch --
> there's not much to a hash table, and once you have changed the
> function prototypes and the allocation library there's not much of the
> original left.

Oh, there is enough. When the hash algorithm starts talking about stuff
like "GF(2^n)-{0}" then I get confoozled :-)  Python's hash table uses
open addressing (rather than chaining) and uses that mathematical stuff to
sequence through the items without repetition and without performing an
integer modulus (% operator). My hash tables always used twin primes and
the % operator, so this was pretty neat to see :-)

But you are right: once you change the items that you store into the hash
(i.e. something other than PyObjects) and the allocation policy, then it
looks quite different. But that basic algorithm is quite nifty...

> >However, you mix the hash table discussion with something about DBs...
> >Were you talking about a general hash table, or about hash table indexes
> >into a file-based database?
> Berkeley DB can be used in a mode where it's just an in-memory hash
> table. It's can obviously be used for many more interesting things by
> the server.

Okay. Well, this thread started with a discussion about DBM code. While it
seems fine to use DBM code for in-memory hash tables, it does seem a bit
weird. But I see your point.

> >The old Berkeley DB wouldn't be too bad. It does have a compatible
> >license.  (IMO, endian-independent is probably moot -- who transports
> >binaries dbs?)
> >From the point of view of an ISP that doesn't want to be tied to a
> platform because of the data files used by our customers, endian-
> independence is a big win.

Fair enough. I just figured that a little tool could export/import across
the platforms, in which case endian-ness isn't a big deal.

But hey: what matters is having a default. A person can always choose
something else.

> >The size limit is a big win over SDBM. Have the bugs been worked out
> >of DB? I seem to recall that it corrupts the file every now and then.
> I haven't heard of that. Do you have any authoritative references?

Not at all. Like I said: it *seems* that I recall that. I'll see if I can
track down something. [just sent a mail off]

> >SDBM is tiny, lightweight, and public domain. The latter point means that
> >we can slap an ASF copyright and license on the thing, and maintain and
> >improve it at will (e.g. APR-ize the thing).
> But then we will be less able to make use of improvements made by the
> perl guys.

There aren't any improvements. SDBM is ten years(!) old.

The only changes are for incremental portability (i.e. no bug fixes). I've
folded in Perl's changes and a few that Ralf made for mod_ssl. Switching
to APR would help with portability and further distance the code from the
subtle platform variances that people patch the thing for.

Heck. I already cleaned it up a bit. You should have seen some of the old
crap in there. For example: doing loop unrolling to copy a buffer rather
than relying on memmove() to be fast. And the old headers that were
included! Oi!

> >I don't recall if DB has file locking or not. The SDBM solution that I'd
> >be checking in (per that STATUS item) does.
> Berkeley DB 2 does, but that doesn't have a compatible license.

Well, the file locking is quite important for an application like Apache.
I'm still in favor of using SDBM with the locking changes that I put in.

I'm not opposed to DB 1.x, but somebody should do the locking code. I'll
get SDBM into the code base, along with the DBM APIs and selection logic.

> >SDBM makes best sense as a fallback. In all cases, GDBM is going to be the
> >"best" answer, short of a full SQL database.
> No, Berkeley DB is better than GDBM because of the endian issue and
> licence; TTBOMK in other respects it is equal.

GDBM is endian-independent, too. I seem to recall other differences, but
can't be more authoritative.

However, GDBM's license totally blows: *full* GPL. I had to yank
dependence on it out of mod_dav because of that. (it is now the user's
option to link against GDBM)


Greg Stein,

View raw message