httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yehezkel Horowitz <>
Subject RE: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
Date Mon, 26 Oct 2015 08:45:58 GMT
Any chance someone could take a short look and provide me a feedback (of any kind)?

I know your focus is on 2.4 and trunk, but there are still many 2.2 servers out there...

Patch attached again for you convenience....

Yehezkel Horowitz
Check Point Software Technologies Ltd.
From: Yehezkel Horowitz
Sent: Monday, October 19, 2015 6:14 PM
Subject: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes
(Patch attached)

Hello Apache gurus.

I was working on a project which used Apache 2.2.x with prefork MPM (using flock as mutex
method) on Linux machine (with 20 cores), and run into the following problem.

During load, when number of Apache child processes get beyond some point (~3000 processes)
- Apache didn't accept the incoming connections in reasonable time (seen in netstat as SYN_RECV).

I found a document about Apache Performance Tuning [1], in which there is an idea to improve
the performance by:
"Another solution that has been considered but never implemented is to partially serialize
the loop -- that is, let in a certain number of processes. This would only be of interest
on multiprocessor boxes where it's possible that multiple children could run simultaneously,
and the serialization actually doesn't take advantage of the full bandwidth. This is a possible
area of future investigation, but priority remains low because highly parallel web servers
are not the norm."

I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 mutexes and
spread the child processes across the mutexes (by getpid() % mutex_number).

So in any given time - 4 ideal child processes are expected [2] to wait in the "select loop".
Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed to accept the
socket (and will release his mutex) and 3 will return to the "select loop".

This solved my specific problem and allowed me to get more load on the machine.

My questions to this forum are:

1.       Do you think this is a good implementation of the suggested idea?

2.       Any pitfalls I missed?

3.       Would you consider accepting this patch to the project?
If so, could you guide me what else needs to be done for acceptances?
I know there is a need for configuration & documentation work - I'll work on once the
patch will be approved...

4.       Do you think '4' is a good default for the mutexes number? What should be the considerations
to set the default?

5.       Does such implementation relevant for other MPMs (worker/event)?

Any other feedback is welcome.

[1], accept Serialization - Multiple
Sockets section.
[2] There is no guarantee that exactly 4 processes will wait as all processes of "getpid()
% mutex_number == 0" might be busy in a given time. But this sounds to me like a fair limitation.

Note: flock give me the best results, still it seems to be with n^2 complexity (where 'n'
is the number of waiting processes), so reducing the number of processes waiting on each mutex
give exponential improvement.


Yehezkel Horowitz
Check Point Software Technologies Ltd.

View raw message