httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@worldgate.com>
Subject making logresolve faster
Date Sun, 27 Jul 1997 05:38:38 GMT
While the best answer to the question of how to make logresolve faster is
"cp /bin/cat logresolve", some people have an odd idea that they need
reverse lookups for the people in suits or because they like having fun
looking at hostnames.

With that idea in mind, I looked at logresolve.  It took 370 seconds to
handle a 10000 line logfile on a P120, 96 meg RAM, decent net connection.
My named cache was cleared, of course, between each test.

I then fiddled with resolver options so it timed out after 1 second (only
resulted in missing lookups for 75 hosts; haven't tried increasing it to 2
or 3 to see the tradeoffs), didn't retry, and used persistent TCP
connections instead of UDP.  That took it down to 212 seconds. 

I then decided that since it was obviously network limited, I should try
multiple lookups at once.  I wrote a driver program that forked a bunch of
logresolves and passed one line of the logfile to the first, the next to
the second, next to the third, etc.  Very simplistic model.  Note that it
doesn't always result in a logfile that is in exactly the right order.  It
also doesn't do some things it should like keep the history hash in the
parent process and just have the child processes do lookups.  Running 60
childs at once, I got the time down to 20 seconds.  At this point I was
getting CPU limited, so increasing the number of child processes wouldn't
help.  From 370 seconds to 20 seconds is a decent speedup.

Would be interesting to compare a threaded approach, but I can't do
multiple simultaneous threaded lookups.  It could also be optimized a bit
more.  logresolve was running on the same machine as the name server.

Note that my current implementation is nasty on the cache; if I have 60
child processes, and have a bunch of entries for the same host one after
another, then they will be distributed to different child processes with
different caches so it will ask the nameserver again.  Fixing this could
help cut down the CPU usage.

Anyone have any other suggestions for speeding things up?  I should give
it a try with logresolve v2, which implements things like an on-disk DBM
cache of results.  Well, for that one I would really have to shift that
code into the parent but that takes effort, although it could eliminate
the out-of-order behavior without being too expensive.


Mime
View raw message