Return-Path: X-Original-To: apmail-httpd-users-archive@www.apache.org Delivered-To: apmail-httpd-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDBA71036D for ; Wed, 11 Sep 2013 04:55:30 +0000 (UTC) Received: (qmail 77605 invoked by uid 500); 11 Sep 2013 04:55:25 -0000 Delivered-To: apmail-httpd-users-archive@httpd.apache.org Received: (qmail 77571 invoked by uid 500); 11 Sep 2013 04:55:24 -0000 Mailing-List: contact users-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: users@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@httpd.apache.org Received: (qmail 77557 invoked by uid 99); 11 Sep 2013 04:55:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 04:55:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of georgi.petrov@seocreativebrain.com designates 91.196.124.206 as permitted sender) Received: from [91.196.124.206] (HELO tara.superhosting.bg) (91.196.124.206) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 04:55:13 +0000 Received: from mail-we0-f175.google.com ([74.125.82.175]:36883) by tara.superhosting.bg with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1VJcS4-000nai-OP for users@httpd.apache.org; Wed, 11 Sep 2013 07:54:52 +0300 Received: by mail-we0-f175.google.com with SMTP id q59so7351968wes.6 for ; Tue, 10 Sep 2013 21:54:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=MS9iyGGQclq0PHSpvz5k0TebLRoxXEeC4PYTIiy+XWM=; b=axvs2ZZtfjHWCEBaFNoIG2YJT3G76VxOE21S+AmfKJlw5hliXRcvZomSHC6ay8Jn1m BrxTz9iWIRXEk60i94jHv4pXhYWB7FcttoeNKgleDAMNRYSewELTvQFTMh8lPwOe+bfw yRLpwMBzA0XqdnicQvel8PWB5tb5CvoU+gO1rWu1eKEAHhCaKUaum1tRxkKuKf9QN2Pf NX+f30mm+5vKFii5Nd1IAOI8KmUtnDYm/8A2KDU+3ci5Qae/I/8wMApIUASQIWX5EWNy Rm2jQuwiiwmwnKnDHEeHy3Hw9QazBdGaJ71ld/6X0jw3Wxwb7rLGI8tZO02BGegmoa+0 VreA== X-Gm-Message-State: ALoCoQnZ3kx08jgRfxQ5/dbpJklOWbes/TyEhYd9H9IsI0UEZt0/CJwM2rrPfJusQNfa7Wn/GeRF MIME-Version: 1.0 X-Received: by 10.180.171.7 with SMTP id aq7mr15632013wic.28.1378875292223; Tue, 10 Sep 2013 21:54:52 -0700 (PDT) Received: by 10.194.154.195 with HTTP; Tue, 10 Sep 2013 21:54:52 -0700 (PDT) Date: Wed, 11 Sep 2013 07:54:52 +0300 Message-ID: From: Georgi Petrov To: users@httpd.apache.org Content-Type: multipart/alternative; boundary=001a1135faf2fb950904e6146bdf X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - tara.superhosting.bg X-AntiAbuse: Original Domain - httpd.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - seocreativebrain.com X-Get-Message-Sender-Via: tara.superhosting.bg: authenticated_id: georgi.petrov@seocreativebrain.com X-Source: X-Source-Args: X-Source-Dir: X-Virus-Checked: Checked by ClamAV on apache.org Subject: [users@httpd] Apache 2.2.15 + mod_fcgid 2.3.7 (CentOS 6.4) graceful restarts, no leftover processes, but errors both in browser and error log --001a1135faf2fb950904e6146bdf Content-Type: text/plain; charset=ISO-8859-1 Hi all, It was more than 9 months ago I discovered a problem with the graceful restarts on a default Virtualmin installation with the default execution mode (mod_fcgid), but recently I had the time to dig deeper and experiment. Since Virtualmin uses Apache + mod_fcgid by default, the experiments will probably lead to the same results on any Apache 2.2 + mod_fcgid 2.3.7 installation. This is not the widely known problem with leftover processes that never get killed on a graceful restart, this is something else - the processes get forcefully killed way to soon and you don't get the output to the browser. Please, test it on your setup and report back the result. What is the setup: CentOS 6.4 x86_64 minimal installation Virtualmin 4.02.gpl GPL installed by the automatic .sh script, all default settings (you can skip this, the problem is probably not virtualmin related) mod_fcgid.x86_64 2.3.7-1.el6 from the virtualmin repo (other should work too) httpd.x86_64 1:2.2.15-29.el6.vm.1 from the virtualmin repo (other should work too) php 5.3.3 from the official repo Single virtual domain, running under the default FCGId execution mode, with 90 sec php execution time and fcgid IO wait. Single test.php file containing What is the error: Run the script via browser, then go and do a graceful restart on apache (service httpd graceful). After around 12 seconds you are going to see "No data received" error in you browser (Chrome) and the following in the apache error log: (22)Invalid argument: mod_fcgid: can't lock process table in pid 25570 (the pid number will be different of course) Further experiments show that this script gets forcefully killed before ending. If you reduce the time the script executes to 5 seconds ($i <= 4), you'll get the same result, this time after 5 seconds. Further experiments show this process completes, but you still get the errors both in the browser and the error log. Try it and post your result. Dig: It is probably a problem of mod_fcgid I tweaked the experiment adding a file write at the end of the script which shows which script completes and which gets killed before that. I got the result above. Add this inside the loop: file_put_contents("test.txt", "test run for: ".$i." seconds"); So why 12 seconds and where is this set. After some time I discovered that increasing FcgidErrorScanInterval to 60 will let the second process to complete (but still you get the errors). If you check the code of mod_fcgid In fcgid_pm_main.c, the graceful restart should be performed by the function kill_all_subprocess() but obviously the scan_errorlist() is also executed even if there is a check for procmgr_must_exit(). The error in the log "can't lock process table in pid 25570" probably means that some information about the process is destroyed immediately upon the graceful restart (the mutex), so we will never get the result back. Even if we get around the early termination of the processes increasing FcgidErrorScanInterval the second problem is actually bigger - all your users are going to see this error. Do you get the same errors and do you have idea how to fix mod_fcgid? Thanks for your time, testing and commenting! Georgi Petrov --001a1135faf2fb950904e6146bdf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi all,

It was more than 9 months = ago I discovered a problem with the graceful restarts on a default Virtualm= in installation with the default execution mode (mod_fcgid), but recently I= had the time to dig deeper and experiment. Since Virtualmin uses Apache + = mod_fcgid by default, the experiments will probably lead to the same result= s on any Apache 2.2 + mod_fcgid 2.3.7 installation. This is not the widely = known problem with leftover processes that never get killed on a graceful r= estart, this is something else - the processes get forcefully killed way to= soon and you don't get the output to the browser. Please, test it on y= our setup and report back the result.=A0

What is the setup:

CentOS 6.4 = x86_64 minimal installation

Virtualmin 4.02.gpl GP= L installed by the automatic .sh script, all default settings (you can skip= this, the problem is probably not virtualmin related)

mod_fcgid.x86_64 2.3.7-1.el6 from the virtualmin repo (= other should work too)
httpd.x86_64 1:2.2.15-29.el6.vm.1 from the= virtualmin repo (other should work too)

php 5.3.3= from the official repo

Single virtual domain, running under the default FCGId = execution mode, with 90 sec php execution time and fcgid IO wait.

Single test.php file containing

<?php
for($i =3D 1; $i <=3D 30; $i++) {
=A0 =A0 e= cho $i."\n";
=A0 =A0 sleep(1);
}
?&= gt;

What is the error:

Ru= n the script via browser, then go and do a graceful restart on apache (serv= ice httpd graceful). After around 12 seconds you are going to see "No = data received" error in you browser (Chrome) and the following in the = apache error log:

(22)Invalid argument: mod_fcgid: can't lock process= table in pid 25570

(the pid number will be differ= ent of course)

Further experiments show that this = script gets forcefully killed before ending.

If you reduce the time the script executes to 5 seconds= ($i <=3D 4), you'll get the same result, this time after 5 seconds.=

Further experiments show this process completes, = but you still get the errors both in the browser and the error log.

Try it and post your result.

D= ig:

It is probably a problem of mod_fcgid

I tweaked the experiment adding a file write at the end of= the script which shows which script completes and which gets killed before= that. I got the result above.

Add this inside the loop:
file_put_contents(&= quot;test.txt", "test run for: ".$i." seconds");

So why 12 seconds and where is this set. After = some time I discovered that increasing FcgidErrorScanInterval to 60 will le= t the second process to complete (but still you get the errors).

If you check the code of mod_fcgid In fcgid_pm_main.c, = the graceful restart should be performed by the function kill_all_subproces= s() but obviously the scan_errorlist() is also executed even if there is a = check for procmgr_must_exit().=A0

The error in the log "can't lock process table= in pid 25570" probably means that some information about the process = is destroyed immediately upon the graceful restart (the mutex), so we will = never get the result back.

Even if we get around the early termination of the proc= esses increasing FcgidErrorScanInterval the second problem is actually bigg= er - all your users are going to see this error.

Do you get the same errors and do you have idea how to fix mod_fcgid?
=

Thanks for your time, testing and commenting!

Georgi Petrov
--001a1135faf2fb950904e6146bdf--