httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject sequence numbers
Date Fri, 15 Aug 1997 01:58:53 GMT
Suppose we had a globally unique sequence number S for each hit.
There are a bunch of uses for S ... for example suppose you've got
some data that you log and archive forever, and some which you log but
throw away weekly.  If you log S in both logs you can do a join(1) on
the two logs to join the permanent records with the temporary records.
I'll ignore the uses of S in cookies, but you can imagine them ;) S can
be passed to CGIs so that they can do things such as log, make database
entries, or stick into hidden form fields to provide state between hits.
It's just generally useful.

So how about an implementation which satisfies the constraint that we don't
want children to have to talk with each other to generate sequence numbers,
and we certainly don't want a cluster of webservers to have to communicate
between themselves to generate it.  We want it cheap.  We don't want the
admin to have to configure it either.

Consider the following structure:

typedef struct {
    time_t stamp;
    unsigned int ip_addr;
    unsigned short child_num;
    unsigned short counter;
} seqno_t;

Keep per-process global seqno_t cur_seqno.

Globally initialize the ip_addr field with the first ip address in
gethostbyname (gethostname()).  This is essentially the physical address
of the machine... regardless of what ServerName/etc are set to.

During child init do this:

    cur_seqno.stamp = time(0);
    cur_seqno.child_num = child_num;
    cur_seqno.counter = 0;

On each hit, calculate a sequence number for the hit as follows:

    r->seqno = cur_seqno;
    if (++cur_seqno.counter == 0) {
	cur_seqno.stamp = time(0);
    }

I claim that r->seqno is "unique enough" to be considered a unique
identifier that distinguishes a hit from all other hits, including
possibly hits against a cluster of web servers all serving the same
website.

Obviously if we're somehow able to serve > 65535 requests in a single
second by a single child then uniqueness breaks down.  It also requires
that the physical addresses of the machines in a cluster differ ... but,
well, that's essentially a requirement of building the cluster in the
first place.

To assist against badness caused by a server's clock going backwards
in time the counter can be seeded with a random 16-bit number instead
of 0 (a well maintained system won't have this problem ... even across
daylight savings, the system's clock is supposed to be slowed down
rather than jumped).  This is also good protection against multiple
server restarts in a one second interval.

seqno_t also has the property that it can adapt if there's a future
need to add more "uniqueness" to the mix.  It can adapt in such a
way that previous seqnos stored in log files and possibly archived,
or in databases, etc.  are still unique from the new sequence numbers.
You just ensure that the new seqno_t also has a time(0) stamp at the
front of it, and have a one second downtime.

seqno_t is a 12-byte quantity (assuming time_t is 4 bytes), which
uuencodes to 16-bytes.  When uuencoding, rather than using [0-9a-zA-Z+/]
we should use [0-9a-zA-Z@-] since + and / have special meanings in URLs
but @ and - don't.  This gives a good representation for logging, and
for passing to CGIs.

So ... can I get a few +1s on the concept?

Dean


Mime
View raw message