httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: Apache Process Model (or Dean Makes My Brain Hurt!)
Date Wed, 15 Oct 1997 18:46:13 GMT


On Tue, 14 Oct 1997, Jason A. Dour wrote:

> On Mon, 13 Oct 1997, Dean Gaudet wrote:
> > Oh it doesn't matter to me.  In fact it could be a separate development
> > effort which we merge in later.  I wouldn't mind a mailing list to discuss
> > high performance web server security issues, they're not as trivial as,
> > say, a secure mail system.  Or maybe they are, and I'm just too close to
> > the problem to see the solution.
> 
> 	I'd like to see this sort of thing discussed as well...even if it
> does have to be a separate mailing list...

The reason I suggested another mailing list is that we might attract
non-apache developers who have an interest in the topic... because at this
stage it's all design talk.

> 	Internally, with the UID/GID model of what I described, no one
> needs to have world access turned on.  Despite what many lay people think,
> the on-disk source for web content often differs from in-browser content,
> and sometimes that information could be sensitive -- say a corporation
> trying to protect its development investment by hiding its exact
> publishing gearworks.  With the current security model, this often means
> world read, meaning anyone on the local machine can read the files
> directly.  I don't find this preferable, and I find it hard to believe
> that it cannot be fixed...that's all.  As I admitted, this is one of my
> Special Interests.

Well, given that unix has exactly one group per file you're kind of
screwed trying to do this.  i.e. if a company is that concerned about
their data then they should buy a dedicated server.  You have the
following requirements:

    - no global access on the local machine
    - read-only access by the webserver
    - read-write access to a group of people responsible for the site

The last is my particular take on it.  If you're willing to say that
exactly one account can have read-write then the solution is trivial, and
the current apache does an OK job at it... at least on reasonable unixes.
Create a group httpd, and do this:

    mkdir /www/docroot
    chgrp httpd /www/docroot
    chmod g+s /www/docroot

Then tell the users to use umask 027.

I define "reasonable unixes" as those which do the following with g+s
directories:

    - all files underneath that directory are created with the group of
	the directory regardless of what the user's default group is,
	and regardless of what group the user is in
    - all newly created subdirectories inherit g+s

Linux has those properties, so do the various BSDs I think.  I'm not
sure about the SysVR4s.  These are the unixes on which you can hope to
use group permissions to do group work without permissions hassles.

> > We can already provide this.  A directory which is owned by httplog,
> > and group readable by a private group can contain a log file which is
> > readable only by httplog and users in the private group (i.e. the users
> > who want a private log).  Then the logger process, of which I intend
> > there to be only one, can log to a file in this directory.
> 
> 	And can this be reliably implemented across a thousand-user
> system?  Can you ensure that something (i.e.  FUU Error #1) won't happen
> to that logfile?  Supporting a system based upon very specific permissions
> would be a nightmare in my experience...somehow those special settings
> always get changed.  Instead, if the process logging the transaction was
> run as the target user, there's no need for special groups and
> permissions.  And such functionality would be *optional* not default,
> since the main logging would be sent through the single pipe to the single
> logger process. 

I don't consider a thousand file handles per httpd child, or a thousand
logging processes a workable solution.  I suppose you could have only a
single pipe to aplogger and aplogger could spawn the thousand children...
that'd require aplogger to be root.  I suppose it could also maintain a
"cache" of spawned children so that it'd only need 200 or 300 open at
a time.  But this means that log writing goes through two extra copies --
one copy from httpd to aplogger(root), and one copy from aplogger(root)
to aplogger(uid)... instead of the one extra that pipes require currently.

My solution with a single logger is far more feasible.  It needs a
rotation program which is clued enough to not screw up permissions.
Combine it with a script that creates a vhost properly (i.e. sets up
all the permissions and adds the templates to the various config files
and restarts the various daemons).  Then I don't really see the permission
problem being an issue.  Only root and httplog can write in those
directories.

Again I'll say that if folks want complete "security" from other folks
then they need to buy another machine.  It's all about cost -- and you
should make it clear to your customers that if they really need the
privacy then they'll have to pay for it.  I wouldn't pretend otherwise.

Oh yeah, and running the logger as the target user means the target
user can futz with their logs.  In my experience websites don't want
the target user to be able to futz with their logs, which is why I'm
going to extremes to make sure there's a protected set of logs which
the target users have read-only access to.

> >     apsuper (run as root)
> <big snippage...>
> 
> 	This is all good...except we still have two superuser programs --

Remember there are two distinct things that require superuser privs:

    opening port 80

    spawning CGIs as a specific user

Combining the two is asking for trouble... they're distinct.  It's trivial
to do the first, I wouldn't want to complicate it with the second -- and
the second is non-trivial.

The second requires the process doing it to have full root privileges,
which is why we can't do it within httpd children.  We don't want
httpd children to have full root privileges... because they're the most
complicated part of this entire picture.

One way of dealing with the performance issue is to make suexec a
service which httpd talks to via a unix domain socket.[1] You write
a suexec-server which opens a unix domain socket in stream mode, and
listens.  Then when it's time to do a suexec thing, open that socket,
write a brief preamble which tells suexec-server what to do, and what
environment to pass, and then proceed as you normally would with a
regular CGI request.  The suexec-server will fork/setuid and exec the CGI.
(Yup, this is like fastcgi.)

This doesn't introduce any extra byte copies ... since the CGI is
eventually talking directly with the httpd via the stream socket, rather
than the pipe() it'd normally use.

The end result is that you replace one exec and a handful of other calls
with a socket open and a context switch.  It's probably faster though, and
should be as secure as suexec is currently.

[1] To portably place access restrictions on a unix domain socket you
have to hide it in a subdirectory which is mode 700 or 770.  This is
because traditional BSD and SysV networking code ignores the privs on
a unix domain socket inode.

So the picture now becomes:

    apsuper (root)
      |
      +-- aplogger (httplog)    # handling request log
      |
      +-- aplogger (httplog)    # handling error log
      |
      +-- apsuexecd (root)	# handling suexec requests
      |
      +-- aphttpd (httpd)       # monitoring aphttpd children
            |
            +-- aphttpd         # serving requests
            |
            +-- aphttpd         # serving requests
            |
            :
            |
            +-- aphttpd         # serving requests
            |
            +-- aphttpd         # serving requests


> 	I once started looking at Qmail, but between work, apache, and
> Real Life, I couldn't find the time and energy to devote to it.  I know
> that at one point it involved like five or more UIDs, a seemingly complex
> process path, and a rabid debate over whether or not it was good.  I've
> since decided not to think one way or the other about the product until I
> can devote time to it...

It's good in my opinion.  There are 7 uids each handling a specific task,
some of which are in groups that don't own any files so that they have
the minimum privileges possible.  There are multiple executables, with
the privileged ones being very small/easy to verify.  All the privileged
interfaces are well defined.  qmail's model won't exactly work for a
webserver though because it is a forking model not unlike the first
webservers... which is way too slow.  But that's fine for email which
is typically done at rates less than 10 messages per second, rather than
the 100 requests per second that some of us see.

There is one more solution for a single machine with privacy between
users, and to cut down on the cost of suexec and so on.  And that is
to bind an httpd to each ip address individually and run a non-suexec
server on it as the specific userid.  I believe uunet does this, but
they probably don't try to put a thousand customers on a box using
this technique.  It's probably good for 50 or 60 customers.

Dean


Mime
View raw message