httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: work in progress: mpm-3.tar.gz (fwd)
Date Sat, 19 Jun 1999 23:34:18 GMT
On Sun, 20 Jun 1999, Zeev Suraski wrote:

> PHP interfaces with Apache in two stages - configuration directive parsing 
> and the actual scripting execution callback.  Lets assume these two are 
> called in separate threads.

This isn't a new assumption, this is how it has always been -- config
parsing at config time is in a completely different context from how it is
used at run-time.  There's a confusion right now about the two "faces" of
configuration: 

- one face is read-only, essentially everything read from the config file
  is used in a read-only manner at run-time.  This stuff can be shared
  without synchronization... the vast majority of config stuff fits in
  this category (all of apache's core modules fit entirely in this
  category)

- the other face is shared, dynamic information which results as a
  function of the configuration.  For example, a dynamic cache of files,
  or a pool of mysql db connections.  This stuff can be shared only if
  synchronization occurs.

In some sense it would be valid to mprotect(PROT_READ) all the pages
used in pconf -- any dynamic data you need should be allocated in
pchild during child_init.  But, the second face is new, it's only really
interesting when you've got a threaded server.

> PHP's configuration parsing mechanism, in this case, reading from 
> .htaccess/.conf files using registered Apache callbacks, stores those 
> directives in PHP dynamic structures that use PHP's memory manager.  PHP's 
> memory manager, in turn, uses TLS.
> 
> Since the data structures that were allocated in the configuration parsing 
> phase are used on another phase (script execution) - their deallocation may 
> occur in a different thread from that in which they were 
> allocated.  Undoubtfully, that would cause problems since the TLS structure 
> used by the memory manager for deallocation won't be the right structure.

To be honest, this sounds like a bug in your code.  Apache never
guaranteed to call your code from the same thread -- in 1.x it built your
structures in one process, and then executed them in another process... 
nevermind thread... 

> Now, I understand your point more or less.  What I'm saying is that such a 
> restriction is very aggressive and somewhat unnatural, and is likely to 
> cause a lot of headache to module developers.  In PHP, it would be possible 
> to get around it by moving the TLS code into a separate DLL/.so that would 
> be server dependent, and would be implemented as standard TLS for all web 
> servers except for apache, and implement it as request-local-storage in 
> Apache.  It would be a pain, though, so if you could keep a background task 
> to think about how to allow modules to ask for all of their execution path 
> to occur within the same thread (without preventing the whole server from 
> using asynchronous operations), it would be great.

Yeah I know it's not the end of the world -- this is me saying "if you
follow me down this path, we can kick ass on performance".  But there are
many many ways to handle this while leaving my restriction in the API...
for example, you could choose to manage a pool of PHP threads within each
apache process -- and your module would simply synchronize the apache
thread with your threads.  That's not as far-fetched as it sounds --
that's really similar to the jserv model, where the processing occurs
in another threaded process.  You won't be the only people with this
requirement, which is why a generic server-server protocol would be
nice...  and hey, an HTTP/1.1 proxy would solve that :)

But I really want the php module to be able to take advantage of the async
stuff -- because php is a real world application, unlike the benchmark
crap that got me off my butt and started me working on this again.
For example, I want to provide the primatives so that php can buffer
responses up to 64k (configurable), and send those out with async support;
for longer responses I'm happy consuming a thread forever.

It'll require a little bit of work in the php response handler, and it'll
mean that the log_request phase will almost certainly happen in a
different thread.  But I don't expect the changes to be difficult...

Just so that we're all on the same page, this is what I'm aiming for:

   - new connection arrives at port 80
   - mpm does an accept and builds a conn_rec
   - thread 123 is chosen to process conn_rec
   - thread 123 goes through the apache API phases:
	post_read_request
	header_parser
	translate_handler
	ap_check_user_id
	auth_checker
	access_checker
	type_checker
	fixer_upper
    - thread 123 calls the response handlers
    - a response handler determines that the async engine can
      do the job
    - the stack unwinds from the apache core up into the mpm
      (thread 123 is no longer handling this connection, it may
      immediately begin servicing another connection)
    - the mpm sends the response asynchronously
    - at some point in the future the response has been sent
    - thread 234 is chosen to handle the "completion" event
    - thread 234 notices that the entire response has been sent, so
      it calls the logger (it may have been doing a range-request,
      and decided to send more ranges asynchronously)
    - thread 234 proceeds with keep-alive processing, perhaps this is
      a persistent connection
    - if it is, we could repeat the above process, and send another
      async response and end up back here again

So we have asynchronous behaviour only in the handler -- and even then
it only happens with modules that support async behaviour.  The logger
is the most obvious thing which is going to be called "out of thread".

And since HTTP doesn't have any state from request to request within
one connection, I don't think we need to debate about the fact that
different requests on one connection may occur within different threads.

That's what I'm aiming for.  The only reason I stated that the thread
may switch in any phase was to try to help folks like Tony who asked for
async DNS...

I think we're all on the same page now :)  I'm not going to break your
stuff in terrible ways, from what you've said your stuff will work just
fine within this model.  We just need to tweak our API a bit.

Dean


Mime
View raw message