httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <>
Subject Re: Forensic Logging
Date Tue, 30 Dec 2003 18:52:07 GMT
Colm MacCarthaigh wrote:

> On Tue, Dec 30, 2003 at 11:49:37AM +0000, Ben Laurie wrote:
>>>Could the forensic_id be tied in with mod_unique_id? It seems confusing
>>>to have two different methods to generate unique id's for requests. Also
>>>with unique_id, I can see it being useful to make CGI's aware of their
>>>"tracking code" via the environment variable. That way a developer can
>>>use the same id to track ingress, processing and egress.
>>Well, it would be possible to make it use the unique ID if present. I'm 
>>not in favour of requiring it, though, because it appears add a good 
>>deal of unnecessary overhead.
> I realise that having the value of getpid() and time() to hand is useful
> for forensic purposes, but a getpid():time():next_id++ will result in
> duplicates accross even small clusters.

Ah, I see :-) does mod_unique_id handle that?

> It's not unusual to be dealing
> with many millions of requests per day in a single logfile. From a cursory
> check here; accross 4 boxes, with a total of 17,000 httpd processes,
> only 3,000 pids are unique. With about 80 requets/sec, that gives me a
> probability of about 1/30625 of a request going to two different machines
> but getting the same pid within one second. Unless I'm reading it wrong,
> the bounds of next_id is more or less a function of MaxRequestsPerChild,
> in my example - it's set to 20, so I can expect a mess-up once every
> 612,500 requests, that's a bit of a pain :/

Well, the most obvious answer is to prepend a box id, which could either 
be done when I generate the logs or when you collate them.

> But more than that, it still seems confusing to have two different methods
> of achieving the same task. If mod_unique_id is too much overhead, then
> it needs to be rewritten. To my mind, both modules need to generate
> reliable unique id's for request tracking purposes. Now either there's
> a good way of doing that, or there's not - but having two different
> methods and defining two different levels of uniqueness doesn't make
> sense to me. I have mod_unique_id turned on for my servers, and don't
> notice much overhead. MTA's like exim, postfix and so on have even more
> complicated means of generating unique message id's, and they achieve
> excellent throughput.
> Though if mod_unique_id can be used if present that'll solve any
> problems I'd have :)

I can easily do that in 2.0 - I can call a "give me a unique ID" hook, 
and if mod_unique ID is present, it can give me its. I could also do it 
by making sure mod_unique_id is run first if present and fishing the ID 
out of the environment, though that's a bit tacky.

>>>Or at least, could a host-specific part be added to the forensic id? 
>>>A lot of people collate logs (myself included ;) from clusters or whatever 
>>>and this would make life much easier there.
>>Hmmm. You should only be looking at requests that didn't complete, and 
>>since it includes the whole header, the host is in there anyway.
> The headers arnt host-specific in a cluster, since typically each
> node is configured to answer for the same hostname. mod_unique_id
> uses apr_gethostname and the ip address of the node to get around this
> problem :)

I had the wrong end of the stick.

> Actually that reminds me, these days mod_unique_id's algorithim isn't
> clever enough for some systems which use L4 switching or anycast
> balancing, I have an experimental patch here somewhere which can help
> fix that, must submit it.

I'd advocate making the unique bit configurable, that must surely fix it 
in all cases?




"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

View raw message