httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: Events, Destruction and Locking
Date Wed, 08 Jul 2009 19:35:20 GMT
On 08.07.2009 15:55, Paul Querna wrote:
> On Wed, Jul 8, 2009 at 3:05 AM, Graham
> Dumpleton<graham.dumpleton@gmail.com> wrote:
>> 2009/7/8 Graham Leggett <minfrin@sharp.fm>:
>>> Paul Querna wrote:
>>>
>>>> It breaks the 1:1: connection mapping to thread (or process) model
>>>> which is critical to low memory footprint, with thousands of
>>>> connections, maybe I'm just insane, but all of the servers taking
>>>> market share, like lighttpd, nginx, etc, all use this model.
>>>>
>>>> It also prevents all variations of the slowaris stupidity, because its
>>>> damn hard to overwhelm the actual connection processing if its all
>>>> async, and doesn't block a worker.
>>> But as you've pointed out, it makes our heads bleed, and locks slow us down.
>>>
>>> At the lowest level, the event loop should be completely async, and be
>>> capable of supporting an arbitrary (probably very high) number of
>>> concurrent connections.
>>>
>>> If one connection slows or stops (deliberately or otherwise), it won't
>>> block any other connections on the same event loop, which will continue
>>> as normal.
>> But which for a multiprocess web server screws up if you then have a
>> blocking type model for an application running on top. Specifically,
>> the greedy nature of accepting connections may mean a process accepts
>> more connections which it has high level threads to handle. If the
>> high level threads end up blocking, then any accepted connections for
>> the blocking high level application, for which request headers are
>> still being read, or are pending, will be blocked as well even though
>> another server process may be idle. In the current Apache model a
>> process will only accept connections if it knows it is able to process
>> it at that time. If a process doesn't have the threads available, then
>> a different process would pick it up instead. I have previously
>> commented how this causes problems with nginx for potentially blocking
>> applications running in nginx worker processes. See:
>>
>>  http://blog.dscpl.com.au/2009/05/blocking-requests-and-nginx-version-of.html
>>
>> To prevent this you are forced to run event driven system for
>> everything and blocking type applications can't be run in same
>> process. Thus, anything like that has to be shoved out into a separate
>> process. FASTCGI was mentioned for that, but frankly I believed
>> FASTCGI is getting a bit crufty these days. It perhaps really needs to
>> be modernised, with the byte protocol layout simplified to get rid of
>> these varying size length indicator bytes. This may have been
>> warranted when networks were slower and amount of body data being
>> passed around less, but I can't see that that extra complexity is
>> warranted any more. FASTCGI also can't handle things like end to end
>> 100-continue processing and perhaps has other problems as well in
>> respect of handling logging outside of request context etc etc.
>>
>> So, I personally would really love to see a good review of FASTCGI,
>> AJP and any other similar/pertinent protocols done to distill what in
>> these modern times is required and would be a better mechanism. The
>> implementations of FASTCGI could also perhaps be modernised. Of
>> course, even though FASTCGI may not be the most elegant of systems,
>> probably too entrenched to get rid of it. The only way perhaps might
>> be if a improved version formed the basis of any internal
>> communications for a completely restructured internal model for Apache
>> 3.0 based on serf which had segregation between processes handling
>> static files and applications, with user separation etc etc.
> 
> TBH, I think the best way to modernize FastCGI or AJP is to just proxy
> HTTP over a daemon socket, then you solve all the protocol issues...
> and just treat it like another reverse proxy.  The part we really need
> to write is the backend process manager, to spawn/kill more of these
> workers.

Though there is one nice feature in the AJP protocol: since it knows
it's serving via a reverse proxy, the back end patches some
communication data like it were the front end. So if the context on the
back end asks for port, protocol, host name etc. it automatically gets
the data that looks like the one of the front end. That way cookies,
self-referencing links etc. work right.

Most of that can be simulated by appropriate configuration with HTTP to
(yes, there are a lot of proxy options for this), but in AJP its
automatic. Some parts are not configurable right now, like e.g. the
client IP. You always have to introduce something that's aware e.g. of
the X-Forwarded-For header. Another example would be whether the
communication to the reverse proxy was via https. You can transport all
that info va custom headers, but the backend usually doesn't know how to
handle it.

Regards,

Rainer

Mime
View raw message