tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Boyce <Chris.Bo...@cdw.com>
Subject RE: Abandoned apache children with mod_jk
Date Sat, 22 Jun 2013 21:18:19 GMT
Thank you so much for the reply.  Here are a couple of examples, as I'm not completely sure
if my symptoms match, though the pstacks do look very similar to my untrained eye:


Here is a two day-old child:

27743:  /usr/local/apache2/bin/httpd -k start
-----------------  lwp# 1 / thread# 1  --------------------
 ff00a42c lwp_wait (3, ffbff804)
 ff001e88 _thrp_join (3, 0, ffbff86c, 1, ff0b2780, ffbff804) + 38
 ff214544 apr_thread_join (ffbff8ec, 32eea8, 7, 0, dc328, b15e0) + c
 0008c43c join_workers (0, fe3aa8, 8bfcc, 32ec30, 0, 1) + ec
 0008c790 child_main (2, 8b31c, 0, feee2a40, ff0b2840, ff0b2780) + 270
 0008c970 make_child (c7800, 2, 0, c8800, c7000, c8400) + 128
 0008d1b4 ap_mpm_run (fe4100f8, e, 0, 1, 27, 1) + 754
 000343c0 main     (d6218, d8190, ffbffc54, c7800, c7800, 0) + 79c
 00033754 _start   (0, 0, 0, 0, 0, 0) + 5c
-----------------  lwp# 3 / thread# 3  --------------------
 ff0058d4 lwp_park (0, 0, 0)
 fefff6e8 cond_wait_queue (32ecc8, 32ec98, 0, 0, 0, 0) + 4c
 fefffd30 cond_wait (32ecc8, 32ec98, 0, 0, fe460a40, 0) + 10
 fefffd6c pthread_cond_wait (32ecc8, 32ec98, 0, 0, 32ec98, 0) + 8
 0008e674 ap_queue_pop (32ec78, fe30bf1c, fe30bf18, 4, 0, 32ee40) + 64
 0008be1c worker_thread (32eea8, 2, fe460a40, c8400, c8400, 0) + 10c
 ff21440c dummy_worker (32eea8, 0, 0, fe460a40, ff214400, 1) + c
 ff005850 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 4 / thread# 4  --------------------
 ff0058d4 lwp_park (0, 0, 0)
 fefff6e8 cond_wait_queue (32ecc8, 32ec98, 0, 0, 0, 0) + 4c
 fefffd30 cond_wait (32ecc8, 32ec98, 0, 0, fe461240, 11692d8) + 10
 fefffd6c pthread_cond_wait (32ecc8, 32ec98, 0, 0, 32ec98, 0) + 8
 0008e674 ap_queue_pop (32ec78, fe20bf1c, fe20bf18, 0, 0, 32ee40) + 64
 0008be1c worker_thread (32eec8, 2, fe461240, c8400, c8400, 4) + 10c
 ff21440c dummy_worker (32eec8, 0, 0, fe461240, ff214400, 1) + c
 ff005850 _lwp_start (0, 0, 0, 0, 0, 0)

...and several more in lwp_park.



And here's another one that's a day old, but looks different (including lots of jk references):

7934:   /usr/local/apache2/bin/httpd -k start
-----------------  lwp# 1 / thread# 1  --------------------
 ff00a42c lwp_wait (6, ffbff80c)
 ff001e88 _thrp_join (6, 0, ffbff874, 1, ff0b2780, ffbff80c) + 38
 ff214544 apr_thread_join (ffbff8f4, 28e228, 2, 0, 1, b1600) + c
 0008c43c join_workers (c, 3c5f38, 8bfcc, 28df50, 0, 1) + ec
 0008c790 child_main (0, 8b31c, 0, feee2a40, ff0b2840, ff0b2780) + 270
 0008c970 make_child (c7800, 0, 0, c8800, c7000, c8400) + 128
 0008d1b4 ap_mpm_run (fe4100f8, e, 0, 1, 26, 1) + 754
 000343c0 main     (d6218, d8190, ffbffc5c, c7800, c7800, 0) + 79c
 00033754 _start   (0, 0, 0, 0, 0, 0) + 5c
-----------------  lwp# 6 / thread# 6  --------------------
 ff00a14c read     (15, fe00a908, 4)
 fe4a87dc jk_tcp_socket_recvfull (15, fe00a908, 4, 2e4bf8, 510, 4ec) + 74
 fe4c3088 ajp_connection_tcp_get_message (35f130, 35f168, 2e4bf8, 361188, 2000, 2064) + 44
 fe4c5588 ajp_get_reply (361168, fe00bb50, 2e4bf8, 35f130, fe00aa70, 2028) + 9c
 fe4c9304 ajp_service (361168, fe00bb50, 2e4bf8, fe00ab38, 1, c00) + 22b8
 fe4a1234 jk_handler (23c, 35e740, 3f4390, 1, 13, 3544c8) + 9e4
 00047534 ap_run_handler (3f40a0, 0, 11, 3e7028, 3f5a08, 0) + 3c
 000479c0 ap_invoke_handler (3f40a0, 9d000, 3f40a0, 0, fe410028, 0) + c0
 00073aa4 ap_process_request (3f40a0, 3, 4, 3f40a0, c8420, 21d8d8) + 160
 00070b34 ap_process_http_connection (3d52e8, 3d5038, 3d5038, 3, c8420, 211980) + 10c
 0004dce8 ap_run_process_connection (3d52e8, 3d5038, 3d5038, 3, 3d52e0, 3d7068) + 3c
 0008bf1c worker_thread (28e228, 0, fe462240, c8400, c8400, c) + 20c
 ff21440c dummy_worker (28e228, 0, 0, fe462240, ff214400, 1) + c
 ff005850 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 7 / thread# 7  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 8 / thread# 8  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 9 / thread# 9  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 10 / thread# 10  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 11 / thread# 11  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 12 / thread# 12  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 13 / thread# 13  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 14 / thread# 14  --------------------
 ff214400 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **

...and so on...



If anyone has the time to confirm my case is a match I'd be very grateful but this patch looks
promising!

Thank you VERY MUCH!


-Chris





-----Original Message-----
From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
Sent: Saturday, June 22, 2013 12:31 PM
To: users@tomcat.apache.org
Subject: Re: Abandoned apache children with mod_jk

On 21.06.2013 19:47, Chris Boyce wrote:
> Hello,
> 
> I'm running apache 2.2.24 (worker MPM) with mod_jk 1.2.37 under Solaris 11, compiled
as follows (from config.log):
> 
> --with-included-apr --with-mpm=worker --enable-so --enable-rewrite 
> --enable-headers --enable-proxy --enable-proxy-http --enable-expires 
> --enable-nonportable-atomics=yes --disable-include --disable-autoindex 
> --disable-imap --disable-userdir CC=/usr/sfw/bin/gcc
> 
> We are running Tomcat 7.0.32.
> 
> Since moving to Solaris 11 I'm noticing over time that apache children are getting left
in an idle state (and usually not showing up on the scoreboard at all) when doing graceful
restarts.  If I do a hard restart, the error_log notes that the process had to be forcibly
killed:
> 
> [Wed May 15 11:41:24 2013] [warn] child process 10057 still did not 
> exit, sending a SIGTERM [Wed May 15 11:41:26 2013] [error] child 
> process 10057 still did not exit, sending a SIGKILL
> 
> If I let apache go unchecked, it will eventually stop passing traffic completely and
a hard restart is required.  Example ps output looks like this:
> 
> nobody 24429 20925   0 11:43:59 ?           0:02 /usr/local/apache2/bin/httpd -k start
> nobody  9750 20925   0 23:59:02 ?           0:00 /usr/local/apache2/bin/httpd -k start
> nobody 20925  2440   0   May 15 ?           3:07 /usr/local/apache2/bin/httpd -k start
> nobody 24689 20925   0 11:47:52 ?           0:00 /usr/local/apache2/bin/httpd -k start
> nobody 24628 20925   0 11:46:18 ?           0:01 /usr/local/apache2/bin/httpd -k start
> nobody 24428 20925   0 11:43:39 ?           0:02 /usr/local/apache2/bin/httpd -k start
> 
> Note PID 9750 is lingering, doing nothing according to pfiles and truss, and its timestamp
coincides with the last graceful restart (log rotation).  Two main differences between this
web server and ones that are working include:
> 
> a) This is Solaris 11 (vs. Solaris 10)
> b) I have hardened apache by putting it in a Solaris 11 zone, and I'm starting apache
as the "nobody" user with the net_privaddr privilege so it can function as the parent process.
 It talks to Tomcat on another zone and everything works great (other than the problem described
here).
> 
> Apache has permission to write to /logs, and /log/apache2 is where I set these:
> 
> JkLogFile /logs/apache2/mod_jk.log
> JkShmFile /logs/apache2/jk-runtime-status
> 
> And this.
> PidFile /logs/apache2/run/httpd.pid
> 
> 
> Can anyone think of a reason why children are not being recycled or getting stranded
like this over successive graceful restarts?  We do use multiple listeners, so I don't know
if I'm dealing with a locking/mutex/serialization type of issue.  I'm not a C programmer.
 There seems to be little info out there for Solaris platforms that's recent.  
> 
> I'd be happy to post more info if needed.  I appreciate your time.

What does "pstack" show for such an abandoned child?

Maybe another occurance of
https://issues.apache.org/bugzilla/show_bug.cgi?id=49504.

Regards,

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message