httpd-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject cvs commit: apache-2.0/mpm/src/docs goals.txt initial_blurb.txt tls.txt
Date Fri, 18 Jun 1999 18:46:56 GMT
dgaudet     99/06/18 11:46:55

  Added:       mpm/src/docs goals.txt initial_blurb.txt tls.txt
  add in some of the discussion
  Revision  Changes    Path
  1.1                  apache-2.0/mpm/src/docs/goals.txt
  Index: goals.txt
  From Fri Jun 18 11:46:10 1999
  Date: Fri, 18 Jun 1999 09:46:51 -0700 (PDT)
  From: Dean Gaudet <>
  Subject: Re: work in progress: mpm-3.tar.gz
  X-Comment: Visit for information regarding copyright
and disclaimer.
  Yup it's a great idea waiting for someone to take it by the reins and
  implement it :) 
  A lot of folks in the past (in the group, and not in the group) have asked
  me "how can I help with 2.0?"  I'm trying to lay out a bunch of projects
  that will help me with this rearchitecture... and, if I can be so bold, I
  suspect that this is how we can finally get 2.0 going.  The stuff I'm
  doing is something which has been long overdue, and which is necessary for
  many of the plans that we've been thinking of. 
  I intend to make the feature set of mpm so desirable to unix-heads that
  they want to help clean it up, re-implement various 1.x features in it,
  and say "this is apache 2.0".
  I intend to make the feature set of mpm general enough that non-unix-heads
  see exactly where they can plug in a new kick-ass model that suits their
  architecture... such as completion ports for NT.  Previously these folks
  had to hack into the utter horrid mess of http_main, and there was much
  duplicated code.  My goal (and I think I've accomplished it in this first
  version) is to abstract the real purpose of the main loop inside http_main
  so that it can be replaced with architecture specific main loops. 
  Yes I'm a unix-bigot, but even within the unix world there are several
  possibilities for the MPM, and I want to be sure that people can implement
  all of them.  There's a fellow Zach from redhat who just implemented what
  is probably the fastest userland model for linux, and he's waiting to plug
  it into apache somehow.  There's a sun dude waiting to plug in solaris 7's
  in-kernel accelerator.  I want to accomodate all of them.  Hence the
  modular design of the MPM. 
  Some day we'll have a portable run-time, and that day we'll integrate it
  with the MPM.  I intend to ignore this issue for the moment, because we
  can't afford to wait any longer for the holy grail portable run-time...
  and I think it'll be obvious to the APR (or NSPR) folks how they too can
  plug into the MPM; but they're going to have to wait for layered BUFF
  At the moment this is "dean's fork of apache", and I intend to play
  dictator on it for a while.  I have some very specific goals for it, and I
  want people to see where I'm going first before I release control.  I do
  want this to become apache 2.0 though, so I'm trying to accomodate enough
  people to make it acceptable.  But I could really use help as I've
  outlined -- at best I'm going to have prototype quality code; we've got a
  couple months of cleanup and testing before it'll be beta quality.
  OK back to hacking :)
  On Fri, 18 Jun 1999, Ralf S. Engelschall wrote:
  > In article <>
you wrote:
  > > Oh, and another TODO which can happen now, before the modules are
  > > converted to the new structure... is to change the module structure to
  > > have only one "on_load" method, and have all other methods registered at
  > > run time. 
  > Yeah, that's a great idea. +1
  > We really have to get rid of the unflexible fixed dispatch lists...
  >                                        Ralf S. Engelschall
  1.1                  apache-2.0/mpm/src/docs/initial_blurb.txt
  Index: initial_blurb.txt
  From Fri Jun 18 11:43:50 1999
  Date: Thu, 17 Jun 1999 12:23:59 -0700 (PDT)
  From: Dean Gaudet <>
  Subject: work in progress: mpm-3.tar.gz
  X-Comment: Visit for information regarding copyright
and disclaimer.
  This is the beginning of some massive code cleanup... we've been building
  so much crap into http_main it's been really difficult for me to work on
  the async/sync hybrid (ASH) stuff that I posted a few weeks back.  My main
  goal here was to rip out the "multi-processing model", or MPM, so that we
  can replace it with whatever we need depending on the platform. 
  For example, there could be a prefork MPM, a win32 MPM, a select/thread
  hybrid MPM, a tpf MPM, ...
  The MPM's job is to listen on sockets and call ap_process_connection (new
  function, just the keepalive wrapper around ap_process_request).  It also
  handles restarts and shutdown.
  Manoj and Ryan started something along these lines, but they weren't
  nearly aggressive enough.  The cross dependencies between all the
  http_foo.c files in 1.x is atrocious.  I started with apache-apr/pthreads,
  but at the moment none of their work in http_main, http_accept, fdqueue,
  etc. is used.  All that stuff should become another MPM. 
  The new http_main.c is trivial in comparison to the old one. 
  I've ripped code out of http_config.c and http_core.c which is related to
  the MPM code -- the MPM is a module, it has its own configuration
  directives and its own methods.  This way, for example, we don't have crap
  such as a "MaxRequestsPerChild" directive that does nothing under win32...
  the win32 mpm doesn't need that directive.
  There's lots still to do...  If people want to help me (please!), here's
  how you can help:
  - grep for "TODO:", there's a lot of easy and not so easy ones to handle
  - most of the modules still need to be ported to the new module
    structure... but go slowly, we want to be sure that the new init phases
    are the ones we require
  - mpm_prefork is a crude port of the apache-1.3 http_main.c to the new
    mpm framework... it's actually my first test case... but there's a long
    list of TODOs still in it, and I haven't really tested it.  I'm not
    going to work on it any longer -- I'm moving onto the ASH MPM.
  - write a win32 MPM (heck, write a win95 and a winnt mpm, stop this silly
    pandering to the weaker denominator)
  - an mpm/foo/ hierarchy should be created, and Configure should be set up
    to understand a new "mpm" directive... treat them almost identical to
    modules, except that at most one mpm can be selected at compile time,
    and the default for unix should be "prefork".
  You can assume that I'm not working on any of those TODOs at the moment --
  my focus is going to be the ASH MPM, and then a revamp of BUFF to handle
  non-blocking sockets, and then a change to the MPM interface to handle
  async models (the message passing stuff). 
  Below is more information. 
  DEATH TO http_main.c! 
  http_main.c is the worst bit of code in apache.  It ties in far too many
  details, and we've been utterly lazy when adding new features.  We just
  wedge them in.  This has to stop.
  My goal is to turn http_main.c into a series of hook calls; and the
  various bits of functionality which we currently burden it with --
  such as opening logs; maintaining the version strings; ... will all be
  moved into modules.  It's only a matter of giving the modules enough
  hook points.
  I also intend to pluck out the "multiprocessing model", or MPM...
  multiprocess, multithreaded, hybrid select/thread, ...
  There is a loop around ap_process_request which handles all the
  details of a connection (keepalive and such), let's call that loop
  ap_process_connection.  From ap_process_connection() on downwards in the
  code there is very little which depends on the MPM in use.  For example,
  none of the core modules care if they are running within a multithreaded
  process, or within multiple single threaded processes.  At the moment
  I am not going to consider modules which wish to make use of threads to
  improve their behaviour -- that is an issue which APR intends to address.
  We will impose an additional restriction on modules -- if threads are in
  use, they may not make any assumption that the same thread will be used to
  process all phases of a request.  Put another way -- thread local storage
  is useless... and there will be no "thread_init_hook" function to tell
  modules when threads have been created.  This restriction is to give us
  access to hybrid async/sync techniques.  Modules needing information
  persisting between request phases should use request-specific data
  (or connection-specific data).
  Modules may assume that all phases of one request are handled within one
  At this point I'm going to ignore everything from ap_process_connection
  on down -- there are things which we'd like to change in there for 2.0,
  but it's not necessary to consider them when rewriting http_main.c.
  For the purposes of this discussion, the "parent" process refers to the
  first process which is invoked (may be replaced during detach()); and
  "child" refers to any process which is capable of serving requests.
  The children may have zero or more threads, it depends on the MPM.
  here is the general sequence of events in http_main.c:
  - main()
  - pglobal = ap_alloc_init();  /* parent of all pools */
  - pcommands = ap_sub pool(pglobal);
  - pre_command_line_hook(pcommands)
  - process command line (or equivalent on win32/etc.)
  - pconf = ap_make_sub_pool(pglobal);
  - ptemp = ap_make_sub_pool(pconf);
  - plog = ap_make_sub_pool(pglobal);
      /* the extra running through of the config... */
  - ap_pre_config_hook(pconf, plog, ptemp);
  - server_conf = ap_read_config
      - as shared modules are loaded, their pre_config_hook() is called
      (pre_command_line_hook() is only available to modules pre-loaded ?)
  - ap_clear_pool(plog);
  - ap_open_logs_hook(pconf, plog, ptemp, server_conf);
  - ap_post_config_hook(pconf, plog, ptemp, server_conf);
  - ap_clear_pool(ptemp);
  big loop {
      - ap_clear_pool(pconf);
      - ptemp = ap_make_sub_pool(pconf);
      - ap_pre_config_hook(pconf, plog, ptemp);
      - server_conf = ap_read_config
      - ap_clear_pool(plog);
      - ap_open_logs_hook(pconf, plog, ptemp, server_conf);
      - ap_post_config_hook(pconf, plog, ptemp, server_conf);
      - ap_destroy_pool(ptemp);
      - mpm_run(pconf, plog, server_conf);
  	- zero or more processes are created (mpm specific), in each child
  	  which will service requests the following occurs:
  	    - pchild = ap_make_sub_pool(pconf);
  	    - child_init_hook(pchild, server_conf);
  	- by some unspecified method, the mpm accepts sockets, and
  	  calls ap_process_connection
  	- at some point, a restart or shutdown event occurs in parent,
  	  and by some unspecified method the mpm notifies its children
  	  of the event... and depending on whether it implements
  	  graceful/non-graceful restart/shutdown it stops servicing
  	- in each child, when there are no outstanding requests, the
  	  MPM calls ap_destroy_pool(pchild)
      - at some point, mpm_run() returns
      - if this is a shutdown then
  	- ap_clear_pool(pconf);
  	- ap_clear_pool(plog);
  	- ap_destroy_pool(pglobal);
  	- exit
      - else it is a restart... continue
  MPM interface:
  The MPM is a module, and it implements directives which control its
  process/thread/etc. spawning algorithm, such as:
  We don't specify what these directives are... in fact, we shouldn't even
  attempt to make them look like the apache-1.3 directives, we should take
  this opportunity to restore some sanity to the names.
  The MPM implements the Listen directive, and any other port listening
  directives which it may need (such as directives for binding a process to
  a particular IP... in cases where cpu affinity/io affinity are implemented
  for examples).
  The MPM also implements the User/Group directives (or whatever their
  equivalents are)... these directives are for controlling what permissions
  the various processes have... they're not to be overloaded with other
  meanings for things such as suexec.
  The MPM provides the following functions:
      /* run until a restart/shutdown is indicated, return 1 for shutdown
         0 otherwise */
      int ap_mpm_run(pool *pconf, pool *plog, server_rec *server_conf);
      /* predicate indicating if a graceful stop has been requested ...
         used by the connection loop */
      int ap_mpm_graceful_stop(void);
  From Fri Jun 18 11:44:00 1999
  Date: Thu, 17 Jun 1999 12:29:11 -0700 (PDT)
  From: Dean Gaudet <>
  Subject: Re: work in progress: mpm-3.tar.gz
  X-Comment: Visit for information regarding copyright
and disclaimer.
  Uhh... forgot the link.
  1.1                  apache-2.0/mpm/src/docs/tls.txt
  Index: tls.txt
  From Fri Jun 18 11:45:56 1999
  Date: Fri, 18 Jun 1999 09:29:25 -0700 (PDT)
  From: Dean Gaudet <>
  Subject: Re: work in progress: mpm-3.tar.gz (fwd)
  X-Comment: Visit for information regarding copyright
and disclaimer.
  On Fri, 18 Jun 1999, Zeev Suraski wrote:
  > Is new-httpd moderated?  Looks like every letter I send gets censored,
  > which is really weird, since I see all sorts of crap on the list, whereas
  > my posts are usually technical...
  > Anyway, I'm sending this to you directly to ensure you get a chance to
  > look at it...
  It only allows subscribers to post... are you using the same subscription
  addr as you are for sending messages to the list?
  Brian Behlendorf deals with the non-subscriber posts, and he sometimes
  lags by a few days... 
  > On Thu, 17 Jun 1999, Dean Gaudet wrote:
  > > We will impose an additional restriction on modules -- if threads are in
  > > use, they may not make any assumption that the same thread will be used to
  > > process all phases of a request.  Put another way -- thread local storage
  > > is useless... and there will be no "thread_init_hook" function to tell
  > > modules when threads have been created.  This restriction is to give us
  > > access to hybrid async/sync techniques.  Modules needing information
  > > persisting between request phases should use request-specific data
  > > (or connection-specific data).
  > Ouch, that's a very agressive restriction.  It pretty much requires any
  > module that uses local storage to be Apache specific, since it would have
  > to save information in Apache's per-request or per-connection structure
  > (not to mention it would have to pass pointers to these structures all
  > over to any function that may require access to these globals, which is
  > terrible). 
  Apache already passes around a request_rec pretty much everywhere... I
  suppose we can implement apache-specific "thread local storage", which we
  save and restore if we ever switch threads... but for first implementation
  I'm really not going to worry about it...
  > I really urge you to reconsider this.  For PHP 4.0 (Zend), I've written a
  > platform independent local storage resource manager (that works very well
  > in the threaded ISAPI/IIS4 environment), but the whole approach will be
  > renedered completely useless with such restrictions, since it's based on
  > the thread id, and obviously expects that all steps and hooks are called
  > within the same thread
  You could base your storage off the conn_rec * instead of a thread_id... 
  Also, I was planning on building a connection_id which is a densely packed
  small integer, because the scoreboard will need something like this. 
  The main case I'm considering this for is the handler phase.  In general,
  any request goes through a bunch of protocol stages and reaches the
  handler, and from there it fits into a few small categories:
  1. copy a file fd back to the client
  2. copy a pipe/socket fd (from another process) back to the client 
  3. copy a mmapped region back to the client
  4. copy a dynamically generated memory region back to the client
  5. the handler writes stuff to a BUFF, and its sent to the client
  1, 2, 3, and 4 are very simple cases where if the stuff to be sent doesn't
  fit in the socket's send buffer, and we have to block the thread serving
  the response.  At this point we're potentially consuming an expensive
  resource (a thread, stack, kernel memory for the thread, ...) just to wait
  for the client.
  Instead we can switch to an asynchronous behaviour, all of 1, 2, 3, 4 are
  obvious -- the handler is essentially in a loop which we all know the
  structure of, because the entire object is already generated somewhere.
  The handler at this point sets up a special new record in the conn_rec,
  and will return with a special return code indicating the switch to
  asynchronous behaviour.
  If the MPM supports async stuff, then it will release this thread from
  serving the rest of this request... otherwise there'll be a library to
  handle the "async" stuff synchronously.  This async stuff will be
  completed using select/poll/non-blocking i/o (or other similar variants,
  there are several other faster methods on other platforms).  We've freed
  up the expensive resource:
  - consume less CPU per client
  - handle thousands upon thousands of long haul slow clients, because
    they're just a conn_rec/request_rec at this point... no kernel stack, no
    context switching... really, just minimal resource consumption
  - do better on existing benchmarks, but more importantly, do better on
    real world problems
  At some point in the future the async stuff will finish, and a caller
  supplied "completion" function will be called in another thread.  This
  gets us back out of the async core, and into protocol code, which will
  "resume" the handler.  This way the async core really has no knowledge of
  the protocols involved -- and we can use this technique for any protocol
  (and for variations on 1, 2, 3, 4)...
  5. is the case for modules which don't want to take advantage of the async
  features.  But we can give them help by turning them into case 4 with more
  features for BUFF... such as buffer up to 50k responses, and do the async
  thing, otherwise do it synchronously. 
  > Frankly, with such restrictions, I'm not sure how
  > something at the complexity of a scripting language can be implemented as
  > an Apache module.  If it can, it would have to tie the implementation to
  > Apache very closely (PHP 4.0's implementation actually allows the same DLL
  > or library to be used for the CGI version, Apache version and IIS version,
  > with thin server-specific wrappers;  with such restrictions, it doesn't
  > seem possible). 
  > If I'm missing something obvious, please enlighten me :)
  I think you're missing something slightly non-obvious :)  Or I'm still
  missing your point...
  In essence I'm saying that the "thread local storage" is part of the
  conn_rec (or request_rec, whichever is most convenient for you).  All
  entry points into your module include a request_rec structure -- you can
  fetch a void * pointer from your request_data entry; you can store
  whatever you need there. 
  So think of the thread as a resource which happens to execute your code
  for a while, but think of the conn_rec/request_rec as your indication of
  what is going on.
  Apache *could* support "thread local storage", and if this really bothers
  you then I'll encourage you to supply a patch.  We can change the
  requirement this way:
  - MPMs which do not guarantee to use the same thread for all request
  phases must save and restore the thread local storage across such changes
  ... but this is really hard to do portably -- unless we require all
  modules to go through an apache, portable thread local storage API.  Which
  means I'd rather it wait for APR, or rather someone else take care of
  it... 'cause the stuff which I'm working on is busy enough already :)
  From Fri Jun 18 11:46:24 1999
  Date: Fri, 18 Jun 1999 11:25:32 -0700 (PDT)
  From: Dean Gaudet <>
  Subject: Re: work in progress: mpm-3.tar.gz (fwd)
  X-Comment: Visit for information regarding copyright
and disclaimer.
  On Fri, 18 Jun 1999, Zeev Suraski wrote:
  > Correct me if I'm wrong, but you can't obtain the conn_rec from anywhere
  > in the code by calling a simple get_conn_rec_ptr()..?
  Everywhere which apache calls your module, it passes a request_rec, or a
  conn_rec, or some other token by which you can figure out your context... 
  beyond that, it's really up to you I think.  If you need a simple
  get_conn_rec_ptr() routine, you can implement one rather easily, using
  whatever portable, non-portable, etc. method you need. 
  Yes, you won't be the only one needing this... but my point is more along
  these lines:  it's not a difficult problem to solve, and I've got way more
  difficult problems to solve at the moment. 
  > I guess it'll be
  > possible to implement such a function (if you keep some mapping between
  > thread id's and their current corresponding conn_rec's, even though I'm
  > not sure if you need it for other purposes or whether it would be just
  > pure added overhead).
  > Without such a function, you still have to pass the conn_rec pointer
  > around everywhere, to any function that may possibly need access to a
  > global per-thread variable, which is exactly what TLS comes to solve...
  Yeah we're saying the same thing -- I'm saying I'm not worrying about it
  because it's solveable.  TLS is really just a void * that magically
  changes on context switches. 
  > Well, I guess it all depends on what kind of stages we're talking about.
  > If the model is remotely similar to Apache 1.x, then we're not in too much
  The model is apache 1.x with a few extra config-time hooks at the moment. 
  > Well then, I wasn't missing that, it's just not enough for our purposes :)
  I still think it's enough, so maybe I'm not explaining myself well enough? 
  > One question I raised is whether I can get to that resource from anywhere
  > in the code even though it wasn't passed on to me through the stack. 
  Yeah -- as I said above, the interface between you and apache are all
  those methods in the module structure... and every one includes a
  context... if you need that context everywhere in your code and don't pass
  it around as a parameter to every one of your functions (as we do within
  apache), then you'll need to set up some TLS -- but reset it on every
  entry point to your module. 
  > Also, it requires the TLS code to be Apache specific, instead of just
  > platform specific.  As I said in my previous post, under Win32, for
  > instance, the same PHP DLL is used for the CGI (which is just a 16KB big
  > .exe) and the IIS module (which is a 90KB DLL).  Both use the very same
  > language DLL.  This is more important that one may think, since the fact
  > that all interfaces work with the same DLL allows you to link other DLLs
  > against this DLL, and have these other DLLs work with any Win32 interface
  > of PHP.  For example, if we want to distribute a MySQL extension DLL for
  > PHP, we won't have to distribute one MySQL extension for CGI, one for IIS,
  > another for Apache and another for NSAPI - but just one extension DLL,
  > that's linked against the PHP DLL, that every interface uses.
  Surely your DLL has a special entry point for each apache method you hook,
  right?  Make that entry point set up your TLS. 
  > Which brings me to another consideration - while I haven't read the full
  > ISAPI spec anywhere (if it even exists) - I think you can rely on all of
  > the request stages in ISAPI happening within the same thread (Microsoft
  > certainly uses it in their examples as far as I recall).  If you won't be
  > able to rely on it in Apache 2.0, it'll pretty much mean that ISAPI will
  > not be supported..?
  That's just too unfortunate then.  i.e. I don't care.  I'm not going to
  stop the progress of apache just for some stupidity in ISAPI.
  If ISAPI requires that, then the WINNT MPM will have to guarantee it. 
  It's easy to guarantee it by not implementing any of the async stuff.  Too
  bad for WINNT users, they're going to be stuck a generation behind in
  technology (as if that should bother them, they're using NT after all). 
  > I'm not exactly sure how we could implement Apache local storage
  > equivalent, unless we have access to the conn_rec from anywhere in the
  > code, regardless of function arguments.
  I know I'm repeating myself a bunch... you have it at all the boundaries
  between apache and your code.  That's enough. 
  > If you want TLS support within
  > Apache you can probably use the TLS code from PHP 4.0 (platform
  > independant and more powerful than the TLS in Win32), you'd just implement
  > the function that returns the thread id as a function that returns the
  > conn_rec pointer or identifier somehow.
  Ryan or someone working on APR might be interested in this. 
  > Two 'still's remain here though.  It still means that the module would
  > have to use Apache-specific TLS, which is less than optimal from a point
  > of view of a module writer that wants the same code to be used on multiple
  > servers (that would be me in our case:). 
  Nope it wouldn't require apache-specific TLS... 
  > And my general hunch about it is still negative, in the sense that I think
  > we'd bump into many problems, especially in modules that use 3rd party
  > libraries that are also thread safe (for example, say you initialize the
  > MySQL client library in one phase, and then use it in another - there's a
  > good chance there's thread-specific initialization in that library, and it
  > won't happen in such a case;  In MySQL's case, I don't think we have a
  > problem, but I'm almost sure there'd be others in which we would).  And
  > there's also ISAPI.
  > How feasible would it be to be able to mark a module as one that wants
  > all of its request steps to be performed within the same thread? 
  To be honest, at the moment I really only care about this requirement
  between the handler phase, and the subsuquent logging phase. 
  But I still say that such libraries are broken.  When you "initialize the
  mysql client library" it passes you back a pointer, right?  A pointer
  which you then pass to all other mysql library functions, right?  If their
  code doesn't hang all the info they need off that pointer, then their code
  is broken.  Sorry, but that's just how it is.  Why pass back a pointer if
  it's not all the info they need?
  That'd be like saying FILE *f = fopen("foo", "r"); can only be used within
  the thread that the fopen occurs.  Which seems pretty silly to me... 
  However, I can relax the requirement a small amount:  It's only MPMs which
  support async behaviour which have this requirement.  On all unix
  platforms there will be the prefork MPM which won't be async; and it's a
  small exercise to build a pthread MPM which won't have async support. 
  Async is an extra feature for the MPM, not a requirement.  So we can cater
  to such brokenness on all unix platforms simply by choosing a less than
  optimal MPM.
  But for the majority of users, I want async MPM support. 

View raw message