Mailing-List: contact users-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@httpd.apache.org
Received-SPF: pass (nike.apache.org: domain of
 georgi.petrov@seocreativebrain.com designates 91.196.124.206 as permitted
 sender)
MIME-Version: 1.0
Date: Wed, 11 Sep 2013 07:54:52 +0300
Message-ID: 
 <CABvmQMWf_fiA4XErcLDxGCLBf1WM89P361EVKXpaMy5Zy9Xp3g@mail.gmail.com>
From: Georgi Petrov <georgi.petrov@seocreativebrain.com>
To: users@httpd.apache.org
Content-Type: multipart/alternative; boundary=001a1135faf2fb950904e6146bdf
Subject: [users@httpd] Apache 2.2.15 + mod_fcgid 2.3.7 (CentOS 6.4) graceful
 restarts, no
 leftover processes, but errors both in browser and error log

--001a1135faf2fb950904e6146bdf
Content-Type: text/plain; charset=ISO-8859-1

Hi all,

It was more than 9 months ago I discovered a problem with the graceful
restarts on a default Virtualmin installation with the default execution
mode (mod_fcgid), but recently I had the time to dig deeper and experiment.
Since Virtualmin uses Apache + mod_fcgid by default, the experiments will
probably lead to the same results on any Apache 2.2 + mod_fcgid 2.3.7
installation. This is not the widely known problem with leftover processes
that never get killed on a graceful restart, this is something else - the
processes get forcefully killed way to soon and you don't get the output to
the browser. Please, test it on your setup and report back the result.

What is the setup:

CentOS 6.4 x86_64 minimal installation

Virtualmin 4.02.gpl GPL installed by the automatic .sh script, all default
settings (you can skip this, the problem is probably not virtualmin related)

mod_fcgid.x86_64 2.3.7-1.el6 from the virtualmin repo (other should work
too)
httpd.x86_64 1:2.2.15-29.el6.vm.1 from the virtualmin repo (other should
work too)

php 5.3.3 from the official repo

Single virtual domain, running under the default FCGId execution mode, with
90 sec php execution time and fcgid IO wait.

Single test.php file containing

<?php
for($i = 1; $i <= 30; $i++) {
    echo $i."\n";
    sleep(1);
}
?>

What is the error:

Run the script via browser, then go and do a graceful restart on apache
(service httpd graceful). After around 12 seconds you are going to see "No
data received" error in you browser (Chrome) and the following in the
apache error log:

(22)Invalid argument: mod_fcgid: can't lock process table in pid 25570

(the pid number will be different of course)

Further experiments show that this script gets forcefully killed before
ending.

If you reduce the time the script executes to 5 seconds ($i <= 4), you'll
get the same result, this time after 5 seconds.

Further experiments show this process completes, but you still get the
errors both in the browser and the error log.

Try it and post your result.

Dig:

It is probably a problem of mod_fcgid

I tweaked the experiment adding a file write at the end of the script which
shows which script completes and which gets killed before that. I got the
result above.

Add this inside the loop:
file_put_contents("test.txt", "test run for: ".$i." seconds");

So why 12 seconds and where is this set. After some time I discovered that
increasing FcgidErrorScanInterval to 60 will let the second process to
complete (but still you get the errors).

If you check the code of mod_fcgid In fcgid_pm_main.c, the graceful restart
should be performed by the function kill_all_subprocess() but obviously the
scan_errorlist() is also executed even if there is a check for
procmgr_must_exit().

The error in the log "can't lock process table in pid 25570" probably means
that some information about the process is destroyed immediately upon the
graceful restart (the mutex), so we will never get the result back.

Even if we get around the early termination of the processes increasing
FcgidErrorScanInterval the second problem is actually bigger - all your
users are going to see this error.

Do you get the same errors and do you have idea how to fix mod_fcgid?

Thanks for your time, testing and commenting!

Georgi Petrov

--001a1135faf2fb950904e6146bdf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi all,<div><br></div><div><div>It was more than 9 months =
ago I discovered a problem with the graceful restarts on a default Virtualm=
in installation with the default execution mode (mod_fcgid), but recently I=
 had the time to dig deeper and experiment. Since Virtualmin uses Apache + =
mod_fcgid by default, the experiments will probably lead to the same result=
s on any Apache 2.2 + mod_fcgid 2.3.7 installation. This is not the widely =
known problem with leftover processes that never get killed on a graceful r=
estart, this is something else - the processes get forcefully killed way to=
 soon and you don&#39;t get the output to the browser. Please, test it on y=
our setup and report back the result.=A0</div>
<div><br></div><div>What is the setup:</div><div><br></div><div>CentOS 6.4 =
x86_64 minimal installation</div><div><br></div><div>Virtualmin 4.02.gpl GP=
L installed by the automatic .sh script, all default settings (you can skip=
 this, the problem is probably not virtualmin related)</div>
<div><br></div><div>mod_fcgid.x86_64 2.3.7-1.el6 from the virtualmin repo (=
other should work too)</div><div>httpd.x86_64 1:2.2.15-29.el6.vm.1 from the=
 virtualmin repo (other should work too)</div><div><br></div><div>php 5.3.3=
 from the official repo</div>
<div><br></div><div>Single virtual domain, running under the default FCGId =
execution mode, with 90 sec php execution time and fcgid IO wait.</div><div=
><br></div><div>Single test.php file containing</div><div><br></div><div>
&lt;?php</div><div>for($i =3D 1; $i &lt;=3D 30; $i++) {</div><div>=A0 =A0 e=
cho $i.&quot;\n&quot;;</div><div>=A0 =A0 sleep(1);</div><div>}</div><div>?&=
gt;</div><div><br></div><div>What is the error:</div><div><br></div><div>Ru=
n the script via browser, then go and do a graceful restart on apache (serv=
ice httpd graceful). After around 12 seconds you are going to see &quot;No =
data received&quot; error in you browser (Chrome) and the following in the =
apache error log:</div>
<div><br></div><div>(22)Invalid argument: mod_fcgid: can&#39;t lock process=
 table in pid 25570</div><div><br></div><div>(the pid number will be differ=
ent of course)</div><div><br></div><div>Further experiments show that this =
script gets forcefully killed before ending.</div>
<div><br></div><div>If you reduce the time the script executes to 5 seconds=
 ($i &lt;=3D 4), you&#39;ll get the same result, this time after 5 seconds.=
</div><div><br></div><div>Further experiments show this process completes, =
but you still get the errors both in the browser and the error log.</div>
<div><br></div><div>Try it and post your result.</div><div><br></div><div>D=
ig:</div><div><br></div><div>It is probably a problem of mod_fcgid</div><di=
v><br></div><div>I tweaked the experiment adding a file write at the end of=
 the script which shows which script completes and which gets killed before=
 that. I got the result above.</div>
<div><br></div><div>Add this inside the loop:</div><div>file_put_contents(&=
quot;test.txt&quot;, &quot;test run for: &quot;.$i.&quot; seconds&quot;);<b=
r></div><div><br></div><div>So why 12 seconds and where is this set. After =
some time I discovered that increasing FcgidErrorScanInterval to 60 will le=
t the second process to complete (but still you get the errors).</div>
<div><br></div><div>If you check the code of mod_fcgid In fcgid_pm_main.c, =
the graceful restart should be performed by the function kill_all_subproces=
s() but obviously the scan_errorlist() is also executed even if there is a =
check for procmgr_must_exit().=A0</div>
<div><br></div><div>The error in the log &quot;can&#39;t lock process table=
 in pid 25570&quot; probably means that some information about the process =
is destroyed immediately upon the graceful restart (the mutex), so we will =
never get the result back.</div>
<div><br></div><div>Even if we get around the early termination of the proc=
esses increasing FcgidErrorScanInterval the second problem is actually bigg=
er - all your users are going to see this error.</div><div><br></div><div>
Do you get the same errors and do you have idea how to fix mod_fcgid?</div>=
</div><div><br></div><div>Thanks for your time, testing and commenting!</di=
v><div><br></div><div>Georgi Petrov</div></div>

--001a1135faf2fb950904e6146bdf--