httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: NameVirtualHost
Date Mon, 27 Oct 1997 22:52:35 GMT


On Mon, 27 Oct 1997, Rob Hartill wrote:

> that doesn't mean to say that there was or wasn't a significantly
> larger number of people not living on the limit and having no problems
> with smaller numbers of vhosts. I don't think I ever had problems with
> them before the recent changes. They just worked intuitively for the
> way I intended to use them. I assume many other people had similar success.
> 

Yes, fine, whatever.  We disagree, let's leave it at that.  Neither of
us have any numbers to base these arguments on.  I'm just telling you
that I see bug reports and such indicating that there are definately a
nontrivial number of people who are running a large number of vhosts.
*This is one of Apache's huge selling points: that we easily support
hundreds of vhosts.*

You make it sound like I've broken things for small vhosts, I most
certainly have not.  There's no "edge" except for the #file-descriptors
"edge", and that's completely unrelated.

> My gut feeling is that people using smaller numbers of name based
> vhosts outnumber the 'life on the edge' folks whose systems broke
> because they were pushing things much harder and that the latter are
> the exception rather than the rule.

Which is why I suggested a very trivial change to NameVirtualHost at the
beginning of this thread.

> Maybe it won't work out more intuitive, but at the moment, yes it does
> feel better to me, mainly because it avoids more nested <></> blocks
> which eat too much space and are harder to look at.

Not everyone is a perl programmer ... your syntax looks like something
a perl programmer would write :)

BTW, I hate the <> crap myself, but that's legacy.  Having a consistent
syntax is important, mixing two syntaxes is not a good idea imho.
For example, we don't use colons for anything yet.  What happens when
the : is part of a path in your syntax?

What happens (in your syntax) when we want to add something more than
just name and path matching for non-ip vhosts?  For example, what if
there's some part of the SSL negotiation that can be used, which requires
another bit of config info?

Mine at least allows for a new directive within the inner-most container.

> ok, here's where I think the proposal has evolved to, using 'Service' again.
> 
> format:
> 
> Service [name]:[ip]:[port]:[path]  vhost-block-alias
> 
> The config parser would read one Service line at a time and put the
> results into tables; there would be 1 table per IP address.
> Your ip-hashing would map an IP to a table. Ordering withing these
> tables is important, the earlier in the table the earlier it is checked
> for a match (the first to match wins).
> 
> For every IP the server operates from, there would be one table
> that lists [name]:[port]:[path] - the things we want to match against, plus
> the vhost-block-alias which is a key to another hash/table that determines
> which <VirtualHost></VirtualHost> contents are important.
> 
> Any 'Service' line where an explicit IP is mentioned, the [name]:[port]:[path]
> goes into the table at the next free slot. If a 'Service' line doesn't
> mention the IP then [name]:[port]:[path] goes into ALL tables - this
> makes sure that we always catch things like Brian's example ::79: where
> we want to intercept all port 79 requests first and not worry about the
> rest of the parameters (all IPs would catch it).

all tables?  You have no idea what all the tables are.  You have no
idea what all the ip addresses are that you'll see.  You need to add
an artificial "default" table.  Note that doing as you suggest makes it
really expensive to have a single port based vhost, it's entry has to
take up space everywhere (this is not a new problem, we already have
this problem).

My proposal has a branch in the decision tree first at the port level
-- and that level can be eliminated entirely.  My hypothesis is that
port-based servers are rare.

> When a request arrives, we use the IP to choose the table and then work
> through systematically looking for an entry that doesn't fail to match
> on any criteria. The catch-all ':::' would be at the bottom of all tables to
> guarantee success.
>
> In this scheme, someone with 500 IPs benefits from your hashing - they
> lookup a table containing (probably) just the ':::' entry and they
> get fast IP -> <VirtualHost></VirtualHost> mapping.
> 
> For the places where hashing isn't an issue, we'll typically end up with
> a small number (usually 1) of tables containg varying numbers of entries.
> The 'n:p:p' entries give full control over how to unambiguously map a
> request to the right vhost configs. The ordering (as Brian showed)
> enables us to pick off special cases first, or (as I showed) to list the
> most likely match first (e.g. a popular hostname comes before an infrequently
> used one).

I still don't think the complexity of Brian's suggestion is necessary.
Longest suffix match (my suggestion) for hostnames is easy to make
go fast, but this isn't a negative for either yours or my proposal.
It's just a simplification.

> The 'Service' lines separate the parameter matching from the <VirtualHost>
> </VirtualHost> actions. I think that separation is very useful (not least
> for mapping multiple criteria to the same <VirtualHost></VirtualHost>
> configs).

Right, we agree on this.

Your proposal and mine are almost equivalent semantically, and I'm not
quibbling over the details of the semantics very much.  My only issue
is syntax.  I want a syntax which explicitly states the hierarchy of
the comparisons.  I feel explicit syntaxes are easier to understand.
I also want a syntax which remotely resembles something we already have;
yours doesn't.

Even if you changed your syntax to get rid of the colons and use
quoted strings (which we already have support for) it still bothers me.
I would have to remember what each of the 4 positional parameters mean,
even though I'll only ever use two of them, host and IP address, which
we can probably agree are the two most commonly used fields.  The
positional ordering also does not correspond to the hierarchy
you're proposing.

Dean


Mime
View raw message