httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 39079] thread eats 100% CPU, presumably spinning in futex
Date Tue, 06 Oct 2009 23:38:50 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=39079

Patrick Higgins <patrick133t@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #29 from Patrick Higgins <patrick133t@yahoo.com> 2009-10-06 16:38:42 PDT
---
I'm also seeing a problem that I believe is the same as this one. I think this
may be related to bug 44402, but I am not sure and have not been able to apply
the patch yet because of fear of losing support from our enterprise
distribution vendor. The later comments by Henrik K and Huo Mingyu about the
problem persisting in 2.2.11 make me think it may be separate from 44402, but
since they did not post additional information, I am not sure that they are
really seeing this problem or perhaps a different one.

I am running the Red Hat httpd-2.2.3-22.el5 version on two servers that jointly
serve around 8 million hits per day. I see around 15 segfaults per day, and so
far I've seen two processes enter and endless loop in the two weeks I have been
using the worker MPM.

I am using the following modules:

LoadModule alias_module modules/mod_alias.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule deflate_module modules/mod_deflate.so
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule jk_module modules/mod_jk.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule mime_module modules/mod_mime.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule security2_module modules/mod_security2.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule ssl_module modules/mod_ssl.so
LoadModule status_module modules/mod_status.so
LoadModule unique_id_module modules/mod_unique_id.so

I have not used the log_forensic_module to get better reporting on the
segfaults, but the two hung processes (three looping threads total) have all
been serving requests in a space that is reverse proxied through mod_proxy_http
with rewrite rules.

The unusual modules of note that I am loading are mod_security2 and mod_jk.
Both of those were compiled in-house, and are versions 2.5.9 and 1.2.28,
respectively.

I have not captured any core files yet, but have taken stack traces from the
two processes in the endless loop. I used "gstack <pid>" to get the traces. I
took three with a pause of a few seconds between each. The complete stack
traces follow (there really were only 3 and 4 threads total), along with a
unified diff showing what changed between the three traces I took:

Week 2 trace:

Thread 3 (Thread 0x428fc940 (LWP 6559)):
#0  0x00002ae823ee85f2 in select () from /lib64/libc.so.6
#1  0x00002ae8237f46d5 in apr_sleep () from /usr/lib64/libapr-1.so.0
#2  0x00002ae8278dcd6f in jk_watchdog_func ()
#3  0x00002ae823a02367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002ae823eef0ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x4fb11940 (LWP 6581)):
#0  0x00002ae821f1da9f in ap_core_input_filter () from /proc/6558/exe
#1  0x00002ae821f2d90a in ap_http_filter () from /proc/6558/exe
#2  0x00002ae827f4300a in ?? ()
#3  0x00002ae827f44f16 in ?? ()
#4  0x00002ae827d31cb1 in proxy_run_scheme_handler ()
#5  0x00002ae827d357f3 in ?? ()
#6  0x00002ae821f1e38a in ap_run_handler () from /proc/6558/exe
#7  0x00002ae821f21802 in ap_invoke_handler () from /proc/6558/exe
#8  0x00002ae821f2bfc8 in ap_process_request () from /proc/6558/exe
#9  0x00002ae821f29200 in ?? () from /proc/6558/exe
#10 0x00002ae821f255f2 in ap_run_process_connection () from /proc/6558/exe
#11 0x00002ae821f30a37 in ?? () from /proc/6558/exe
#12 0x00002ae823a02367 in start_thread () from /lib64/libpthread.so.0
#13 0x00002ae823eef0ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2ae826528560 (LWP 6558)):
#0  0x00002ae823a03655 in pthread_join () from /lib64/libpthread.so.0
#1  0x00002ae8237f3d85 in apr_thread_join () from /usr/lib64/libapr-1.so.0
#2  0x00002ae821f301b0 in ?? () from /proc/6558/exe
#3  0x00002ae821f30d79 in ?? () from /proc/6558/exe
#4  0x00002ae821f30ef5 in ?? () from /proc/6558/exe
#5  0x00002ae821f3173b in ap_mpm_run () from /proc/6558/exe
#6  0x00002ae821f0b7e0 in main () from /proc/6558/exe

Week 2 diff:

--- 6558.1    2009-10-06 14:37:51.000000000 -0600
+++ 6558.2    2009-10-06 14:37:51.000000000 -0600
@@ -5,7 +5,7 @@
 #3  0x00002ae823a02367 in start_thread () from /lib64/libpthread.so.0
 #4  0x00002ae823eef0ad in clone () from /lib64/libc.so.6
 Thread 2 (Thread 0x4fb11940 (LWP 6581)):
-#0  0x00002ae821f1da9f in ap_core_input_filter () from /proc/6558/exe
+#0  0x00002ae821f1daab in ap_core_input_filter () from /proc/6558/exe
 #1  0x00002ae821f2d90a in ap_http_filter () from /proc/6558/exe
 #2  0x00002ae827f4300a in ?? ()
 #3  0x00002ae827f44f16 in ?? ()

The third trace was identical to the second.

Week 1 trace:

Thread 4 (Thread 0x4142a940 (LWP 7213)):
#0  0x00002b593d3ea5f2 in select () from /lib64/libc.so.6
#1  0x00002b593ccf66d5 in apr_sleep () from /usr/lib64/libapr-1.so.0
#2  0x00002b5940dded6f in jk_watchdog_func ()
#3  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x4282c940 (LWP 7215)):
#0  0x00002b593cceca80 in apr_allocator_free () from /usr/lib64/libapr-1.so.0
#1  0x00002b593bf29c57 in ?? () from /usr/lib64/libaprutil-1.so.0
#2  0x00002b593bf2a7c0 in apr_brigade_cleanup ()
#3  0x00002b593b41f307 in ap_core_output_filter () from /proc/7212/exe
#4  0x00002b593b42df0d in ap_process_request () from /proc/7212/exe
#5  0x00002b593b42b200 in ?? () from /proc/7212/exe
#6  0x00002b593b4275f2 in ap_run_process_connection () from /proc/7212/exe
#7  0x00002b593b432a37 in ?? () from /proc/7212/exe
#8  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
#9  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x43c2e940 (LWP 7217)):
#0  0x00002b593bf2a92e in apr_brigade_length ()
#1  0x00002b593b42f77e in ap_http_filter () from /proc/7212/exe
#2  0x00002b594144500a in ?? ()
#3  0x00002b5941446f16 in ?? ()
#4  0x00002b5941233cb1 in proxy_run_scheme_handler ()
#5  0x00002b59412377f3 in ?? ()
#6  0x00002b593b42038a in ap_run_handler () from /proc/7212/exe
#7  0x00002b593b423802 in ap_invoke_handler () from /proc/7212/exe
#8  0x00002b593b42dfc8 in ap_process_request () from /proc/7212/exe
#9  0x00002b593b42b200 in ?? () from /proc/7212/exe
#10 0x00002b593b4275f2 in ap_run_process_connection () from /proc/7212/exe
#11 0x00002b593b432a37 in ?? () from /proc/7212/exe
#12 0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
#13 0x00002b593d3f10ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b593fa2a560 (LWP 7212)):
#0  0x00002b593cf05655 in pthread_join () from /lib64/libpthread.so.0
#1  0x00002b593ccf5d85 in apr_thread_join () from /usr/lib64/libapr-1.so.0
#2  0x00002b593b4321b0 in ?? () from /proc/7212/exe
#3  0x00002b593b432d79 in ?? () from /proc/7212/exe
#4  0x00002b593b432ef5 in ?? () from /proc/7212/exe
#5  0x00002b593b43373b in ap_mpm_run () from /proc/7212/exe
#6  0x00002b593b40d7e0 in main () from /proc/7212/exe

Week 2 diff 1:

--- 7212.1    2009-09-24 14:22:42.000000000 -0600
+++ 7212.2    2009-09-24 14:25:13.000000000 -0600
@@ -5,7 +5,7 @@
 #3  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
 #4  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
 Thread 3 (Thread 0x4282c940 (LWP 7215)):
-#0  0x00002b593cceca80 in apr_allocator_free () from /usr/lib64/libapr-1.so.0
+#0  0x00002b593cceca84 in apr_allocator_free () from /usr/lib64/libapr-1.so.0
 #1  0x00002b593bf29c57 in ?? () from /usr/lib64/libaprutil-1.so.0
 #2  0x00002b593bf2a7c0 in apr_brigade_cleanup ()
 #3  0x00002b593b41f307 in ap_core_output_filter () from /proc/7212/exe
@@ -16,7 +16,7 @@
 #8  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
 #9  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
 Thread 2 (Thread 0x43c2e940 (LWP 7217)):
-#0  0x00002b593bf2a92e in apr_brigade_length ()
+#0  0x00002b593bf2a935 in apr_brigade_length ()
 #1  0x00002b593b42f77e in ap_http_filter () from /proc/7212/exe
 #2  0x00002b594144500a in ?? ()
 #3  0x00002b5941446f16 in ?? ()

Week 2 diff 2:

--- 7212.1    2009-09-24 14:22:42.000000000 -0600
+++ 7212.3    2009-09-24 14:25:22.000000000 -0600
@@ -5,7 +5,7 @@
 #3  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
 #4  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
 Thread 3 (Thread 0x4282c940 (LWP 7215)):
-#0  0x00002b593cceca80 in apr_allocator_free () from /usr/lib64/libapr-1.so.0
+#0  0x00002b593cceca8c in apr_allocator_free () from /usr/lib64/libapr-1.so.0
 #1  0x00002b593bf29c57 in ?? () from /usr/lib64/libaprutil-1.so.0
 #2  0x00002b593bf2a7c0 in apr_brigade_cleanup ()
 #3  0x00002b593b41f307 in ap_core_output_filter () from /proc/7212/exe
@@ -16,7 +16,7 @@
 #8  0x00002b593cf04367 in start_thread () from /lib64/libpthread.so.0
 #9  0x00002b593d3f10ad in clone () from /lib64/libc.so.6
 Thread 2 (Thread 0x43c2e940 (LWP 7217)):
-#0  0x00002b593bf2a92e in apr_brigade_length ()
+#0  0x00002b593bf2a924 in apr_brigade_length ()
 #1  0x00002b593b42f77e in ap_http_filter () from /proc/7212/exe
 #2  0x00002b594144500a in ?? ()
 #3  0x00002b5941446f16 in ?? ()

>From a quick look at the code, it looks like the bucket brigade functions
and/or macros are looping endlessly. I don't understand that data structure so
I don't know how this might happen. I can try to inspect the data structure the
next time we get a process in an endless loop to see what it looks like.

I was hoping that someone might already know what state the data structure
would need to be in to cause an endless loop, and solving the problem would be
a matter of determining how it's getting into that state.

I can gather more information as needed, but making changes to these servers
can be slow and requires fighting bureaucracy. It unlikely that I would be able
to test anything other than small patches to the Red Hat httpd.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message