httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Chuguev" <Chug...@Clickstream.com>
Subject "Better" mod_unique_id
Date Mon, 28 Apr 2008 14:41:44 GMT
Hi,

I'm developing a solution generating unique IDs for the requests to  
websites that are not only clustered but also geographically  
dispersed. This implies the following:
- the website's virtual host section on each Apache server has the  
same ServerName which is mapped by DNS to different IP addresses using  
various methods, geo-proximity, round-robin, etc.
- the virtual host's IP address is normally but not necessarily *;
- the actual IP address the Apache listens to for this virtual host is  
normally, but not necessarily, an intranet address (behind a load  
balancer).

After analysing the format of the ID generated by mod_unique_id, and  
reading the module's source code, I have a feeling that this module  
has serious flaws if used in my situation.
No offence to the authors, I'm sure the module serves its purpose just  
right for the majority of its users. But as it seems that it doesn't  
do this in my case, I thought I'd better ask if someone knows why.

I understand that the module is relatively old and likely has been  
ported from a pre-2.0 version, when no APR library existed, and this  
might explain its design. I'd be glad if someone could either confirm  
this or
explain why it has been done like that.

Now to the point of my question. The unique_id_rec structure that  
contains the binary representation of the unique ID consists of the  
following fields:
     unsigned int stamp;
     unsigned int in_addr;
     unsigned int pid;
     unsigned short counter;
     unsigned int thread_index;

1. Why use unsigned int timestamp when there exists apr_time_t which  
is 64 bit and seems to be at least 1 microsecond accurate? Surely  
there is unsigned short counter which helps if there is more than one  
request coming to the same IP address / PID / thread per second, but  
still I can hardly see this as a better design.

2. Why use unsigned id pid plus unsigned int thread_index if there  
exists long r->connection->id? thread_index is in fact produced by  
doing htonl((unsigned int)r->connection->id), but MPMs seem to ensure  
the child_id is included there already! While it is just 4 bytes long  
compared to the 8-byte pid/thread_index combination, still it is  
guaranteed to be unique among all worker threads of the Apache server  
in the system. And I don't think this particular field needs  
converting to the network byte order.

3. Using unsigned int in_addr with the server-side IPv4 address works  
well in the single cluster in the IPv4 network only. What if only IPv6  
is being used in the intranet? What if multiple dispersed clusters  
with exactly the same intranet IP addressing schemes serve the same  
website? Please correct me if I'm wrong but I think the following  
structure would represent the unique website more correctly:
- union {struct in_addr, struct in6_addr} local_ip_addr: the IP  
address of the local side of the HTTP connection;
- union {struct in_addr, struct in6_addr} dns_ip_addr: one (any?) of  
the IP addresses that are mapped to the website's domain name in DNS.  
The latter can be omitted if the former IP address is public.

Does anyone see any flaws in the design where the following structure  
is used?
	apr_time_t stamp;	// 8 bytes, converted to network byte order
	long connection_id;	// size depends on architecture: normally 4 or 8  
bytes, doesn't need htonl
	union {struct in_addr, struct in6_addr} local_ip_addr;	// 4 to 16 bytes
	[union {struct in_addr, struct in6_addr} dns_ip_addr;]	// 0 to 16 bytes

Comments and suggestions are appreciated.

Konstantin Chuguev
Software Developer

Clickstream Technologies PLC, 58 Davies Street, London, W1K 5JF,  
Registered in England No. 3774129



Mime
View raw message