httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <r...@imdb.com>
Subject Re: NameVirtualHost
Date Mon, 27 Oct 1997 21:39:36 GMT
On Mon, 27 Oct 1997, Dean Gaudet wrote:

> On Mon, 27 Oct 1997, Rob Hartill wrote:
> 
> > The still leaves the other parameters, hostnames, ports, serverpath that
> > determine the actions to take.
> 
> Those parameters cost the same if you do a linear lookup or a hashed
> lookup ... so I didn't count them. 
> 
> > My point is that these big ISPs are the exception and that most people
> > are far more likely to use minimal name based solutions.
> 
> I don't think they're the exception.  I think they're the reason we have
> well over 600k "hosts" running apache.  Roughly half the bug reports we
> receive on vhosts the user has hundreds of vhosts.  Then there were all
> the bug reports of "Apache works fine with N-1 vhosts but not N" for
> N=128, and 256.

that doesn't mean to say that there was or wasn't a significantly
larger number of people not living on the limit and having no problems
with smaller numbers of vhosts. I don't think I ever had problems with
them before the recent changes. They just worked intuitively for the
way I intended to use them. I assume many other people had similar success.

My gut feeling is that people using smaller numbers of name based
vhosts outnumber the 'life on the edge' folks whose systems broke
because they were pushing things much harder and that the latter are
the exception rather than the rule.

> > Your hashing algorithm sounds fine. Use that as a first step in the
> > process - using the IP address hash to find which of the '::' style
> > tables to use to narrow the search down. Most people will only have
> > 1 IP address so that first stage can be skipped.
> 
> Can you repost the semantics including Brian's changes?  Your original
> proposal involved weighting and such which did not look easily hasheable
> to me. 

below.

> Are you convinced that your "Service" directive is more intuitive than my
> nested directives?  I'm not.

Maybe it won't work out more intuitive, but at the moment, yes it does
feel better to me, mainly because it avoids more nested <></> blocks
which eat too much space and are harder to look at.

> Mine explicitly show the user that there is
> an ordering to comparisons -- ports first, then ip address, then hostname,
> then server path.  Yours hides that in some weighting system.  vhost
> lookup should be an exact science... and I don't get that feeling from
> your first proposal...

ok, here's where I think the proposal has evolved to, using 'Service' again.

format:

Service [name]:[ip]:[port]:[path]  vhost-block-alias

The config parser would read one Service line at a time and put the
results into tables; there would be 1 table per IP address.
Your ip-hashing would map an IP to a table. Ordering withing these
tables is important, the earlier in the table the earlier it is checked
for a match (the first to match wins).

For every IP the server operates from, there would be one table
that lists [name]:[port]:[path] - the things we want to match against, plus
the vhost-block-alias which is a key to another hash/table that determines
which <VirtualHost></VirtualHost> contents are important.

Any 'Service' line where an explicit IP is mentioned, the [name]:[port]:[path]
goes into the table at the next free slot. If a 'Service' line doesn't
mention the IP then [name]:[port]:[path] goes into ALL tables - this
makes sure that we always catch things like Brian's example ::79: where
we want to intercept all port 79 requests first and not worry about the
rest of the parameters (all IPs would catch it).

When a request arrives, we use the IP to choose the table and then work
through systematically looking for an entry that doesn't fail to match
on any criteria. The catch-all ':::' would be at the bottom of all tables to
guarantee success.

In this scheme, someone with 500 IPs benefits from your hashing - they
lookup a table containing (probably) just the ':::' entry and they
get fast IP -> <VirtualHost></VirtualHost> mapping.

For the places where hashing isn't an issue, we'll typically end up with
a small number (usually 1) of tables containg varying numbers of entries.
The 'n:p:p' entries give full control over how to unambiguously map a
request to the right vhost configs. The ordering (as Brian showed)
enables us to pick off special cases first, or (as I showed) to list the
most likely match first (e.g. a popular hostname comes before an infrequently
used one).

The 'Service' lines separate the parameter matching from the <VirtualHost>
</VirtualHost> actions. I think that separation is very useful (not least
for mapping multiple criteria to the same <VirtualHost></VirtualHost>
configs).


not quite in pseudo code but I hope it's easy to follow.


rob
--
Rob Hartill                              Internet Movie Database (Ltd)
http://www.moviedatabase.com/   .. a site for sore eyes.


Mime
View raw message