httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe Jr." <wmr...@gmail.com>
Subject Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support
Date Thu, 06 Mar 2014 05:57:35 GMT
Yingqi,

as one of the 'Windows folks' here, your idea is very intriguing, and
I'm sorry that other issues have distracted me from giving it the
attention it deserves.

If you want to truly re-architect the MPM, by all means, propose it as
another MPM module.  If it isn't adopted here, please don't hesitate
to offer it to interested users as separate source (although I hope we
find a way to adopt it.)

The idea of different MPM's was that they were swappable.  MPM foo
isn't MPM bar.  E.g., worker, prefork, event each have their own tree.
 Likewise, there is nothing stopping us from having 2, or 3 MPM's on
Windows, and there is nothing stopping us from stating that there is a
prerequisite on a particular MPM of Linux 3.1 kernels or Windows
2008+.

The Windows build system hasn't been so flexible, but this can be
remediated with cmake, as folks have spent many hours to accomplish.
I understand you are probably relying on functions authored entirely
for the winnt_mpm, and we can re-factor those on trunk out to the
os/win32/ directory so that MPM's may share them.

The definition of the word "prefork" is a single thread process which
handles a request.  Please don't misuse the phrase, and without
reviewing your code, I'll presume that is what you meant.

I don't doubt your results of benchmarking, but please make note that
only Windows Server OS's can actually be used to perform any
benchmarks.  Any 'desktop' enterprise, professional or home editions
are deliberately hobbled, and IMHO the project should make no
accommodation for vendor stupidity.

In terms of benchmarking, I don't know how you measured, but if you
can peg a machine at 95% total utilization yet httpd shows itself
consuming only 70% or 60%, that means it is kernel-bound.  That is
usually a good thing, that the app is operating optimally and is only
constrained by the architecture.

I think I understand where you are going with reuseport.  That doesn't
equate to the Unix OS's... they can distribute the already opened
listener to an unlimited number of forks.  On windows, we also
distribute the listener through a write/stdin channel to the child
process.  What doesn't work well is for parallel windows children to
share certain resources such as the error log, access log etc.  But we
can contend with that issue.  What we can't contend with is what 3rd
party modules have chosen to do, and almost any patch you offer is not
going to be suitable for binary compatibility with 3rd party httpd 2.4
modules compiled for windows, so your patch presented for the 2.4
branch is rejected.

That said, we should endeavor to solve this for 2.6 (or 3.0 or
whatever we call the 'next httpd').  We are all out of fresh ideas, so
proposals such as yours are a welcome sight!!!

Finally, please do have patience, large patches require time for us to
digest, and we have limited amounts of that resource.  As I mention,
adding a whole new MPM directory to trunk, alone, should meet very
little resistance for any architectures.

Thank you for your posts, and please do not feel ignored.  There are a
handful of people active and we all have many details to attend to.

Yours,

Bill

On Fri, Jan 24, 2014 at 5:25 PM, Lu, Yingqi <yingqi.lu@intel.com> wrote:
> Dear All,
>
>
>
> Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel
> Xeon 2600 series systems, using an open source three tier social networking
> web server workload, revealed performance scaling issues.  In current
> software single listen statement (listen 80) provides better scalability due
> to un-serialized accept. However, when system is under very high load, this
> can lead to big number of child processes stuck in D state. On the other
> hand, the serialized accept approach cannot scale with the high load either.
> In our analysis, a 32-thread system, with 2 listen statements specified,
> could scale to just 70% utilization, and a 64-thread system, with signal
> listen statement specified (listen 80, 4 network interfaces), could scale to
> only 60% utilization.
>
>
>
> Based on those findings, we created a prototype patch for prefork mpm which
> extends performance and thread utilization. In Linux kernel newer than 3.9,
> SO_REUSEPORT is enabled. This feature allows multiple sockets listen to the
> same IP:port and automatically round robins connections. We use this feature
> to create multiple duplicated listener records of the original one and
> partition the child processes into buckets. Each bucket listens to 1
> IP:port. In case of old kernel which does not have the SO_REUSEPORT enabled,
> we modified the "multiple listen statement case" by creating 1 listen record
> for each listen statement and partitioning the child processes into
> different buckets. Each bucket listens to 1 IP:port.
>
>
>
> Quick tests of the patch, running the same workload, demonstrated a 22%
> throughput increase with 32-threads system and 2 listen statements (Linux
> kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without
> SO_REUSEPORT), 10% performance gain was measured. With single listen
> statement (listen 80) configuration, we observed over 2X performance
> improvements on modern dual socket Intel platforms (Linux Kernel 3.10.4). We
> also observed big reduction in response time, in addition to the throughput
> improvement gained in our tests 1.
>
>
>
> Following the feedback from the bugzilla website where we originally
> submitted the patch, we removed the dependency of APR change to simplify the
> patch testing process. Thanks Jeff Trawick for his good suggestion! We are
> also actively working on extending the patch to worker and event MPMs, as a
> next step. Meanwhile, we would like to gather comments from all of you on
> the current prefork patch. Please take some time test it and let us know how
> it works in your environment.
>
>
>
> This is our first patch to the Apache community. Please help us review it
> and let us know if there is anything we might revise to improve it. Your
> feedback is very much appreciated.
>
>
>
> Configuration:
>
> <IfModule prefork.c>
>
>     ListenBacklog 105384
>
>     ServerLimit 105000
>
>     MaxClients 1024
>
>     MaxRequestsPerChild 0
>
>     StartServers 64
>
>     MinSpareServers 8
>
>     MaxSpareServers 16
>
> </IfModule>
>
>
>
> 1. Software and workloads used in performance tests may have been optimized
> for performance only on Intel microprocessors. Performance tests, such as
> SYSmark and MobileMark, are measured using specific computer systems,
> components, software, operations and functions. Any change to any of those
> factors may cause the results to vary. You should consult other information
> and performance tests to assist you in fully evaluating your contemplated
> purchases, including the performance of that product when combined with
> other products.
>
>
>
> Thanks,
>
> Yingqi

Mime
View raw message