httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: NameVirtualHost
Date Wed, 15 Oct 1997 00:06:01 GMT
Ok here we go again.  Can the other folks who understand vhosts step up
and help me?  Can the folks who haven't paid attention please read this?
I can't keep typing this in variations every time the next person in the
group decides to look at what Dean's done to vhosts lately, my wrists
are in bad enough shape already.  Ken, this could be considered a draft
of vhosts-in-depth 1.3-style.

NameVirtualHost directives (there may be multiple) list exactly the set
of IP addresses on which Apache will consider the Host: header (i.e.
ServerName, ServerAlias, and the names specified in the <VirtualHost>
statement), and in the absence of the Host: header it will consider
ServerPath.  These are called name-vhosts.

Any ip addresses not listed in a NameVirtualHost directive never have
their Host: header or URI considered for the purpose of determining the
vhost to serve a request from.  These are called ip-vhosts.

ip-vhosting is an exact protocol.  Whatever ip you come in on determines
the host that will serve.  name-vhosting is an inexact protocol,
for example if the user enters "http://www/foobar" then "Host: www"
will be sent, and the server has to GUESS what the client really meant
(this is not a client bug, this is how the protocol is defined).  For this
reason, amongst others, it is now possible to enforce the strict protocol
(ip-vhosts).

Now suppose a request arrives on address a.b.c.d, but specifies via
Host: the name of a server on x.y.z.w.  Prior to 1.3 apache would (most
of the time) happily serve the request from the named server... rather
than the a.b.c.d server(s).  This is a security bug (ask Dirk, he said
someone has already run into it with 1.2, and he had to hack around it).
If you specify an ip-based vhost on a particular address Apache should
never guess that it knows better than you do and serve from another vhost.

The addition of NameVirtualHost and the slight semantic changes in 1.3
make it possible to specify exactly what ip addresses a particular vhost
responds to.  That word "exactly" is very important.  Pre-1.3 there's
a lot of voodoo.

Now, Rob has had problems with mod_perl generated config sections.
It could just be that Doug's code makes assumptions about how to
create virtual hosts, I'm not sure.  Rob was able to hack around it by
essentially doing "NameVirtualHost `hostname`".  Rob is using name-vhosts
only.  So that's exactly what he's supposed to add ... because in pre-1.3
it was just assumed that anything which mapped to the `hostname` ip would
be a name-vhost.  That broke all sorts of 1.0 configs, and confused
the hell out of people (search the bugdb for "ServerName localhost"
and you'll find all the times I had to answer with the kludge response
to work around it).

Rob also asked about round robin servers.  Well, they're supported as
well.  For example, suppose foo.imdb.com is an A-based RR (i.e. returns
multiple A records, a CNAME-based RR is slightly different), then
ignoring the fact that using DNS in your config is a dangerous thing
you can build one single config file like this:

    # specifies all n RR addresses as NameVirtualHost addresses
    NameVirtualHost foo.imdb.com

    <VirtualHost foo.imdb.com>
	# first name-vhost definition, note that it also can respond on any
	# of the RR addresses ... since we accept multi-address names.
    </VirtualHost>

    <VirtualHost foo.elsewhere.com>
	# second name-vhost definition, presumably foo.elsewhere.com maps
	# to the same set of RR addresses as foo.imdb.com
    </VirtualHost>

That one config will run on all hosts in the RR.  If there are n hosts
in the RR then each host will have n-1 unused addresses inserted into
their hash-tables.  If that bugs you then you have to maintain 4 separate
configs, or list all the CNAMEs in a single config ... which is how it
is pre-1.3 as well.

The above works for ip-based vhosts too.  This is the original reason
I put in multi-address support for each vhost.  HotWired's servers for
example have somewhere around 30 addresses each ... and they all run
the same config files.  For example, www.hotwired.com has two addresses
204.62.129.193 and 204.62.129.65.  Both are listed in a <VirtualHost>
statement in the config file.  (There are also redundancy reasons for
doing this -- for example, should a hotwired server fail, the hits can
be directed to another server without changing any configuration at all
... but that's beyond the scope of this discussion ;)

Here's an example.  The assumption here is that this is an ISP who has
various costs of services.  The lowest cost web service is a name-based
vhost service sharing the same ip address with hundreds upon hundreds
of clients.  The next higher cost is a dedicated ip address with a
single name.  The highest cost is a dedicated ip address with multiple
names (why you'd want this is beyond me, but you can do it).  This is
untested, and incomplete ... it's just an example.

    <VirtualHost _default_>
    # here we'll catch any addresses we don't otherwise specify elsewhere...
    # this allows us to be a little sloppy, say adding ip aliases to the
    # machine without defining a vhost, or by removing a vhost def'n when
    # a customer doesn't pay, or whatever.  We just catch it here, and
    # this server would be some sort of default page advertising your ISP
    # or something like that.
    RewriteRule /not_found.html		-		[L]
    RewriteRule /(.*)			/not_found.html	[L]
    DocumentRoot /www/not_found
    </VirtualHost>

    # this is where all the lowend clients go
    NameVirtualHost 10.0.0.1

    <VirtualHost 10.0.0.1>
    # this vhost is going to catch anything that doesn't Host: match or
    # ServerPath match anything below ... it should put up a page which
    # contains a list of all the lowend clients, and include links to
    # www.client.com/client/ -- the /client/ will match ServerPath in
    # subsequent requests, and direct hits to the right server
    DocumentRoot /www/lowend
    </VirtualHost>

    # here are two example lowend clients

    <VirtualHost 10.0.0.1>
    ServerName www.client1.com
    ServerAlias *.client1.com client1.com
    ServerPath /client1/
    DocumentRoot /www/lowend/client1/
    # so that URLs always contain /client1/ at the front
    RewriteRule ^/client1/		-		[L]
    RewriteRule ^/(.*)			/client1/$1	[R]
    </VirtualHost>

    <VirtualHost 10.0.0.1>
    ServerName www.client2.com
    ServerAlias *.client2.com client2.com
    ServerPath /client2/
    DocumentRoot /www/lowend/client2/
    # so that URLs always contain /client2/ at the front
    RewriteRule ^/client2/		-		[L]
    RewriteRule ^/(.*)			/client2/$1	[R]
    </VirtualHost>

    # you probably get the idea.  Ok now for some dedicated ip customers.

    <VirtualHost 10.0.1.1>
    ServerName www.client3.com
    DocumentRoot /www/dedicated/client3
    </VirtualHost>

    <VirtualHost 10.0.1.2>
    ServerName www.client4.com
    DocumentRoot /www/dedicated/client4
    </VirtualHost>

    # yadda yadda.  Ok now for some dedicated ip customers that for some reason
    # want to have multiple server names ... you can handle these folks a lot
    # like you handle the lowend clients above with the fancy redirecting.  Or
    # whatever.  I'll skip the redirecting, you can fill it in given the
    # example above.  I'll just guess that the need for multiple names is
    # because they have multiple departments.

    NameVirtualHost 10.0.2.1
    <VirtualHost 10.0.2.1>
    # here's the default vhost for this dedicated ip
    ServerName www.client5.com
    # note you don't want, or need "ServerAlias *.client5.com client5.com"
    # because this vhost picks up everything which doesn't otherwise match
    # on ip address 10.0.2.1 anyhow.
    DocumentRoot /www/dedicated/client5
    </VirtualHost>

    <VirtualHost 10.0.2.1>
    ServerName www.department1.client5.com
    ServerAlias *.department1.client5.com department1.client5.com
    DocumentRoot /www/dedicated/client5/department1
    </VirtualHost>

    <VirtualHost 10.0.2.1>
    ServerName www.department2.client5.com
    ServerAlias *.department2.client5.com department2.client5.com
    DocumentRoot /www/dedicated/client5/department2
    </VirtualHost>

Note in this server that the addresses are completely partitioned.
10.0.0.1 serves exactly the lowend customers, 10.0.1.1 serves exactly
client3, 10.0.1.2 serves exactly client4, and 10.0.2.1 serves exactly
client5 (with two departmental servers, and a generic company-wide
server).  There's no cross over ... in fact if you took this config and
put it on a machine that had only the ip alias 10.0.0.1 then it would
serve only the lowend customers, and never serve a hit to the other three
clients.  So this config could be moved from machine to machine without
modification assuming you map your ip aliases somewhere appropriate
(although it's kind of silly to do this, but you can if you want to,
maybe there's a reason).

This example is far more than anyone does now (partially because you
can't do it with pre-1.3).  People generally make name-vhost only servers,
or ip-vhost only servers.

I forgot to test "NameVirtualHost *", but in theory you should be able
to build a name-vhost only server by using that.  But you'd probably
have to specify "<VirtualHost *>" for all the virtual hosts.  This makes
sense to me, it may not make sense to others.

Finally, in all of this, it should be made clear, if it isn't already,
that the addresses a server has ip aliased to it, and the addresses it
has Listen directives for, and the addresses in its VirtualHost directives
are all independant.  It's up to the admin to make the three make sense.
For most sites, the default "Listen *:80" is just fine and doesn't need
to be changed.  They already know to stuff ip aliases on the box, and
adding them in the config is expected.

It's not possible for Apache given a list of <VirtualHost> directives
to construct a set of Listen directives.  Maybe it is on some specific
unixes, but it's certainly not portable, and it's certainly not worth it.
For example, Apache can't guess that you want it to Listen on a single
address rather than using Listen *:80.  This is the reason these things
have always been independant... and because they're independant you
can build server farms which use the identical config on each and every
machine, and the router alone knows which machine handles which subset
of the vhosts.

Is this any more clear now?

Dean


Mime
View raw message