www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Moore <...@cmc.net>
Subject Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere
Date Tue, 15 Apr 1997 20:05:55 GMT
I went to play with one of my demon-children (300k seconds on her now) and
looked some at the code.  

(gdb) print (char *) timeout_name
$7 = 0x5c480 "proxy send body"

So, at least I know where it thinks the last timeout was set.  Will take me a
while to grok proxy_send_fb, but that seems to be where the problem is.

Looking through the logs, I see at about the same time that this child was last
seen fetching stuff from www.cdrom.com:
[Fri Apr 11 14:59:16 1997] proxy send body timed out for 192.168.1.32
[Fri Apr 11 15:01:50 1997] proxy send body timed out for 192.168.1.32
[Fri Apr 11 15:02:10 1997] proxy send body timed out for 192.168.1.32

It looks like the timeout was received but send_body never saw it.

I won't profess to be an expert on Apache's timeouts, but it looks like there
should be either the timeout_req flag set here (it's not).  It looks like
current_conn->aborted should be set, though gdb won't let me check.

All the while's in proxy_send_fb do check that... but there is a while loop in
saferead that doesn't and I don't see an easy way to make it get to the
connection structure.

Would changing that to a hard_timeout fix it?

> You may want to double check any soft_timeout()s.  If any of them go off,
> there will be no further timeouts around and the code is expected to
> properly check connection->aborted before doing anything.  A quick glance
> doesn't show any such obvious things, but it is possible.  Temporarily
> replacing all soft_timeout()s with hard_timeout()s would tell if this is
> the problem, although it could break some things; haven't really looked at
> the code... 
> 
> On Sun, 13 Apr 1997, Brian Moore wrote:
> 
> > Nope, it's a http transfer (but a large one).  Not sure what it is,
> > though: it seems to be that alarm(0) is getting called [which in my
> > looking at the code is a bad thing to do] somewhere.  The client request
> > on this is on the other side of a packet filtering router, but at 10mbps,
> > so it shouldn't be a client timeout.
> > 
> > Since it got through the flush-the-buffer stuff in saferead(), I think
> > it's not the speed, just the dropped alarm.  See I printed out the value
> > of alarms_blocked, which in theory should mean it's not blocked. :)  I've
> > left a couple of these children running (there were three transferring
> > files via http from www.cdrom.com.  The specific URLs (though I doubt it
> > matters) GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
> > and 
> > GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
> > I killed the third demon-child -- those two are still running.
> > 
> > Since we have about 50 machines on the far side of this router using
> > illegal IP's, it may be hard to spot: they do a good amount of web/ftp
> > access and it all runs through Apache, so it's a rare occurence.
> > 
> > When tracking down the missing unblocks, I did insert some code to
> > whine... something like:
> > 
> >     alarm_save = alarm(0);
> >     if (!alarm_save )
> >         syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p",
> >         getpid()); 
> >     else
> >         alarm(alarm_save);
> > 
> > When that code was in place in 1.2b7 I did see a bunch of times saferead
> > was getting called with no alarm, which shouldn't happen (though I will
> > confess I don't know what alarm() returns if say 1/2 a second is
> > remaining on Solaris).
> > 
> > I'll see if I can find who was downloading quake this week and if they
> > did anything like abort it or anything.
> > 
> > On Sun, 13 Apr 1997, Chuck Murcko wrote:
> > 
> > > Thanks for the report, Brian. It looks like a large file transfer is
> > > indeed punching through a soft timeout. I assume these are FTP
> > > transfers? I can duplicate your environment, so I should see the
> > > problem when I test for it.
> > > 
> > > Brian Moore wrote:
> > > > 
> > > > >Number:         374
> > > > >Category:       mod_proxy
> > > > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > > > >Confidential:   no
> > > > >Severity:       serious
> > > > >Priority:       medium
> > > > >Responsible:    apache (Apache HTTP Project)
> > > > >State:          open
> > > > >Class:          sw-bug
> > > > >Submitter-Id:   apache
> > > > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > > > >Originator:     bem@cmc.net
> > > > >Organization:
> > > > apache
> > > > >Release:        1.2b8
> > > > >Environment:
> > > > Solaris 2.5, all recommended patches, gcc 2.7.2
> > > > >Description:
> > > > Looks like there's one other problem in mod_proxy with alarms being
> > > > turned off (not blocked via the block_alarms() call, but alarm(0)'d
> > > > for some reason).  I'm guessing on the module involved, since the
> > > > three dead children this morning were all doing proxy stuff.
> > > > 
> > > > The backtrace of a child that's been waiting for 110k seconds:
> > > > #0  0xef67792c in _read ()
> > > > #1  0x29364 in saferead ()
> > > > #2  0x29480 in bread ()
> > > > #3  0x488b0 in proxy_send_fb ()
> > > > #4  0x47e78 in proxy_http_handler ()
> > > > #5  0x432c0 in proxy_handler ()
> > > > #6  0x1f040 in invoke_handler ()
> > > > #7  0x21dc0 in process_request_internal ()
> > > > #8  0x21df4 in process_request ()
> > > > #9  0x1bf30 in child_main ()
> > > > #10 0x1c0cc in make_child ()
> > > > #11 0x1c8c8 in standalone_main ()
> > > > #12 0x1cb88 in main ()
> > > > (gdb) up
> > > > #1  0x29364 in saferead ()
> > > > (gdb) print alarms_blocked
> > > > $1 = 0
> > > > 
> > > > So this seems to be something calling alarm(0) somewhere instead of a
> > > > 'logical' alarms-off via the official mechanism.
> > > > 
> > > > >How-To-Repeat:
> > > > Not sure: virtually all of our proxy users are on a 10Mbps ethernet
> > > > but behind a firewall.  This usage may or may not be relevant.  The
> > > > children I found dead this morning were fetching files from cdrom.com
> > > > via http, so it should be normal the only odd thing is that these were
> > > > quake files so they were no doubt huge. >Fix:
> > > > Will be looking at the code myself this week
> > > > >Audit-Trail:
> > > > >Unformatted:
> > > 
> > > -- 
> > > chuck
> > > Chuck Murcko
> > > The Topsail Group, West Chester PA USA
> > > chuck@topsail.org
> > > 
> > > 
> > 
> 


Mime
View raw message