apr-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 48029] httpd-2.2.14 hangs in port_getn
Date Mon, 26 Oct 2009 17:40:00 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=48029

--- Comment #15 from Jeff Trawick <trawick@apache.org> 2009-10-26 10:39:51 UTC ---
>First off: Compiling with ac_cv_func_port_create=no makes everything run fine,
>it seems. That is a strong hint that the problem is located somewhere in the
>near the port interface, either implementation or usage.

>Is the port thing restricted to solaris? Is it worth investigating more? Is it
>an option to globally disable ports for everyone?

A little background:

apr_pollset_poll() is what APR provides to apps like httpd, and that works
cross-platform.  The "Event Port" stuff applies only to Solaris.  If Event
Ports were disabled, it wouldn't affect other platforms and it wouldn't affect
all applications on Solaris since plain poll() would be used.

Manwhile, I found a better solution besides disabling Event Ports which I've
committed.  This is it:

http://svn.apache.org/viewvc/apr/apr/branches/1.3.x/poll/unix/port.c?r1=807269&r2=829803

>However, I cannot reproduce your useport example on any of the machines I use
>to trigger the buggy behavior here. Can you prove it returns correct values
>when trussed? If no, that should be a completely different issue - if yes, it
>might be related, though.

You get "rc 0 nget 0" displayed on a machine with the problem?  Ouch.

Yes, truss or dbx makes that simple testcase run clean for me on Solaris
10/x86-32.

$ ./useport-32
rc -17349963 nget 0
$ truss ./useport-32 2>&1 | tail -5
brk(0x08062908)                                 = 0
fstat64(1, 0x08046FA0)                          = 0
rc 0 nget 0
write(1, " r c   0   n g e t   0\n", 12)        = 12
_exit(0)
$

similarities with your observations

a. got worse with APR 1.3.9

The huge negative retcode from port_getn() wouldn't cause a problem with APR
1.3.8, since 1.3.8 checked specifically for "rc == -1" instead of "rc < 0".

(The "got worse" idea assumes you had a different problem with 1.3.8, possibly
the one that 1.3.9 corrected, which was very intermittent.)

b. problem doesn't occur under observation via truss

c. unexpected EAGAIN failure

apr_pollset_poll() would grab whatever was in errno when it thought port_getn()
failed on these cases where port_getn() didn't really fail and didn't set
errno.  Since we're doing I/O with the CGI and recently did I/O with the client
, EAGAIN is a likely errno value to pick up incorrectly.

d. a case where port_getn() returns bogus negative number matches
cgi_bucket_read()'s usage

a case with the bad retcode is when port_getn() is called with 0 timeout to
find out immediately if an event is ready (if an event is actually ready it
won't return a bad retcode)

that's the kind of call made when cgi_bucket_read() is called the first time by
content-length filter; only when cgi_bucket_read() returns EAGAIN does
content-length filter tell it to wait until data is available

--/--

I hope you're able to try the tiny patch pointed to above in in this comment in
place of earlier attempts.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@apr.apache.org
For additional commands, e-mail: bugs-help@apr.apache.org


Mime
View raw message