From: Andrew Wilson <a.wilson@elsevier.co.uk>
Message-Id: <199707181036.LAA01018@windrush.elsevier.co.uk>
Subject: Apache VirtualHost Hashing vs Linear-Listing (fwd)
To: new-httpd@apache.org
Date: Fri, 18 Jul 1997 11:36:48 +0100 (BST)
Content-Type: text
Sender: new-httpd-owner@apache.org
Precedence: bulk
Reply-To: new-httpd@apache.org

Hi,

	Dave Hankins <dhankins@sugarat.com> asked me to forward 
this on.  We last heard from Dave when he posted an improved version of
mod_dld several months ago.  This patch relates to changes made to an
existing multi-box site in order to support > 512 hosts per box
efficiently.

It's a long message, but worth the read.  Those of you with big vhost
sites may even find it useful.

Cheers,
Ay.

---

I have completed a hashing VirtualHost implementation for Apache v1.2.1.

For those on the Apache development team who remember me and are curious,
no, I have not completed my port of ELF libdl support from Apache 1.1 to
Apache 1.2.  It is no longer of import to me.

The Problem:

Apache 1.2.1 stores all server information in server_rec's in memory.
These server_recs are kept in a linear list.

For every connection, the server must traverse the linear list until it
finds a server_rec that has a matching ip address on its server_ip list.

This becomes a problem should the webmaster have the urge to have a large
number of virtual web sites on the same machine.  As it stands, Apache is
efficient enough that it will handle large numbers of virtual hosts
(depending on the volume those hosts push out of course) quite readily.  I
have taken this development direction only because of the report (however
vague) from friends in the industry that have indicated more than 508 would
"perform poorly."  The indication was that the number of virtual hosts was
the problem, not the accumulated load of those hosts.

The Solution:

A 256 bucket hash table using a (quick) 32->16->8 bit XOR.  This table is
fleshed out on every server configuration command and allocates the list
entries for the buckets out of the pconf memory pool.  In this manner there
are no memory leaks and the table is not corrupt on HUPs adding virtualhost
directives.

This would reduce the worst-case efficiency to (n/256) from (n).

Benchmarks:

I started on an external benchmark but realized many of the (changing)
variables involved in a web server's function are not easily controlled.
As such, the benchmark is driven out of apache directly as follows:

---
/* BENCH */
gettimeofday( &start, NULL );

    conn->server = find_virtual_server(saddr->sin_addr, ntohs(saddr->sin_port),
                                       server);

/* BENCH */
gettimeofday( &stop, NULL );
printf( "%u %u %u %u\n", start.tv_sec, start.tv_usec, stop.tv_sec,
                                                                stop.tv_usec );
---

These lines are added both to a stock 1.2.1 Apache and a "1.2.1.dwh" Apache.
The default list of modules was used.
The same compiler and CFLAGS were used.

The server was configured with 1271 addresses (five class C's plus my test
address) and both servers used the exact same configuration files for the
benchmarks.

-- The test address was listed first in the configuration file.  This is
	important to note as the current insertion algorithms for both
	the linear and hashed Apaches make this test address the WORST
	case possible.  This was done for no other reason than because it's
	easier to append new config lines onto the end of the conf file.

A small application I had originally written for an external evaluation was
used to create web traffic (hits with 10,000 microsecond "rest" periods).

The "rest" periods are intended to keep system load down and availability
high.  I did not want to see my timings become artificially increased
because of context switches.

The Results:

nop:/tmp/hankstone$ wc hash.frob linear.frob 
  19280   77126  651264 hash.frob
  19644   78583  663552 linear.frob
  38924  155709 1314816 total
nop:/tmp/hankstone$ gawk 'BEGIN { maxi = 0 ; mini = 1000000 ; numstats = 0 ; stat = 0 } /^[0-9]* [0-9]* [0-9]* [0-9]*$/ { if ( ($4 > $2) && ($1 == $3) ) { numstats++ ; val = ($4 - $2) ; if( val > 1000000 ) { next } ; stat += val ; if( mini > val ) { mini = val } if( maxi < val ) { maxi = val } } } END { print "["mini"/" stat / numstats "/"maxi"] [min/avg/max] microseconds" }' hash.frob 
[20/22.1534/70] [min/avg/max] microseconds
nop:/tmp/hankstone$ gawk 'BEGIN { maxi = 0 ; mini = 1000000 ; numstats = 0 ; stat = 0 } /^[0-9]* [0-9]* [0-9]* [0-9]*$/ { if ( ($4 > $2) && ($1 == $3) ) { numstats++ ; val = ($4 - $2) ; if( val > 1000000 ) { next } ; stat += val ; if( mini > val ) { mini = val } if( maxi < val ) { maxi = val } } } END { print "["mini"/" stat / numstats "/"maxi"] [min/avg/max] microseconds" }' linear.frob 
[1478/1489.7/68262] [min/avg/max] microseconds

The cutoff for microsecond values larger then 1000000 was added as the
captured files would occaisionally list the last hit with only part of
the first number reported.  You will also note that, to avoid mathematical
error, I have completely cut out any conditions where the microsecond
counter has rolled over and the seconds counter has incremented.

I verified that this did not cut out more than the last record from the
collected set of data.

--- Discussion of Benchmark results.

>From the linear version's minimum nominal to the hashed version's MAXIMUM
recorded nominal, the studious reader can see that the hashed version is
2111.43% better.  From average-to-average, 6724.2%.

This is also a profile of that algorithm alone.

But, 1478 microseconds corresponds to 1.478 milliseconds, roughly double
the nominal round trip time for ICMP echos across 100mbit full duplex
ethernet.  This means a packet has been transmitted, in one direction or
another, and all one after the other not at the same time, four times in
0.7 ms.  In 1.4 ms, it would be eight times.

Assuming the average .html file is no larger than 6000 bytes, this is the
same amount of time it would take the webserver to start and complete the
http return transaction (from its point of view, the network latency
increases this of course).  The packets would be transmitted and sitting in
buffer somewhere.  Even if the average is not below this number, it is
still a good indication of scale for this number.

I am neglecting, of course, the other processing Apache must do involving the
http transaction.

It is my estimation that one would not perceive any form of connection
construction latency degradation on the hashed version of Apache until
131,072 addresses were configured.  I certainly would not recommend
approaching that limit without reengineering the hashing table.

Work Remaining to be done:

The find_virtual_server function in my implementation attempts a hashed
lookup on the address and then traverses a linear list of the hashed
server's addresses to check for a match.  This inner linear list can be
avoided by preloading the "sar" address into the list entry.  This would
increase the hash memory overhead by 4 bytes per address to a total of 12 per
address + 1024 (base 256 pointers array, the table itself), but that should
be acceptable.

The function also currently *REVERTS* to the old painful linear list should
it not find a match in the hash table.  This happens for the "main" server
address (anything that is not a VirtualHost entry).  The entire set of times
this happens is not in my knowledge.  But I know it happens for the server's
main address.  This would make the patch currently unusable should anyone
decide to combine virtual and "core" web functionality onto the same
machines.  I ask the Apache development team to make a determination on the
set of "non virtual" cases that must be considered after the hash table and
from there, how to most efficiently handle those cases.  It is because I did
not know the answer to this question that I left it as it is, reverting to
the linear list.

Someone also needs to do the necessary "multiple hash size" work and make
options available from the Configuration file.  Currently, things would
be messy if you were to merely up the "VIRT_HASH_SIZE" #def.  You need
to go back and revisit the hashing function, which is the one and only
way a bucket is referred to in the hash table, in order to ensure that
Apache would use the entire hash table.  The function itself, as a
limitation of the algorithm I selected, is not prepared for multiple hash
sizes and will need to be edited should you decide to change its size.  I
tend to think that 256 buckets is "plenty" (I selected a 'char' as a return
value with purpose) and, if one were to make any changes whatsoever, one
should strive to make it accept smaller hash sizes, on the order of 4 bit or
2 bit (16 or 4 buckets).  To do this, one would need merely to collapse the
hash function's xor carrier another one or two iterations, respectively.
I would recommend a 4 bit hash as "standard", with the Configuration hooks
to move to either 256 or 2, optimizing for "virtualserver" (defined as a
server using around 256 addresses) and "singleserver" configurations.  If
I knew the group's accepted form factor for configuration, and the inner
workings of Apache Configure, I would have done this already.

Someone from the Apache development group should also ensure the code is
written in acceptable style (I appear to be the only freak who has hacked
on Apache that writes more comments than he does actual code) and that
functions, globals, and #defs are in the appropriate places.  I was driven
only by my own sense of style and purpose, not by any collaboritive group's
sense.  For example, I tried to keep those things global which I added all
defined within four lines of each other.

You should pick a better name for the hash bin structures other than
"HCF."  I couldn't think of anything else offhand.  I almost called them
"ExeBuckets" but thought that would be too pornographic for Apache code
distribution.  You should find 'struct hcf' littered throughout my code.

Requests:

Consider this a formal request on the part of one David Hankins that this
patch, or an iteration very similar in function to this patch, be
introduced into Apache mainline distribution, preferrably starting with the
current beta.

I ask only that I and whoever "cleans up" the introduction of the patch to
Apache receive (commented) credit for both work and ideas present in this
document.  Should you choose to include this in future distributions of
Apache.  "David Hankins" for name and "Independant Mercenary" for any
organizational credits will suffice.

I need the code below tested, widely.  Should you choose to put it in
mainstream, please get it in your current beta and thrown around a whole
lot.  From start to finish it took me thirty minutes and there were no
compilation errors and no problems when it ran.  That means there must be
something horribly wrong that I can't see, and it needs testing more than,
say, code that has errored and been fixed a few times.  Call it a gut
feeling, the thing has to have bugs.

The Patch:

As I do not currently have any better means, I have included the patch
below.  Don't whine, it's really not that large.  Only 282 lines, 10kbytes.

Thank you for your time.

--- Start of patch ---
diff -c3 ./http_conf_globals.h ../../apache_1.2.1.dwh/src/http_conf_globals.h
*** ./http_conf_globals.h	Sun Jun 29 14:08:35 1997
--- ../../apache_1.2.1.dwh/src/http_conf_globals.h	Tue Jul 15 22:29:11 1997
***************
*** 84,86 ****
--- 84,88 ----
  extern char server_root[MAX_STRING_LEN];
  extern char server_confname[MAX_STRING_LEN];
  
+ extern struct hcf *virt_hashed[VIRT_HASH_SIZE];
+ 
diff -c3 ./http_config.c ../../apache_1.2.1.dwh/src/http_config.c
*** ./http_config.c	Sun Jun 29 14:08:36 1997
--- ../../apache_1.2.1.dwh/src/http_config.c	Wed Jul 16 05:12:48 1997
***************
*** 1044,1049 ****
--- 1044,1052 ----
      max_requests_per_child = DEFAULT_MAX_REQUESTS_PER_CHILD;
      bind_address.s_addr = htonl(INADDR_ANY);
      listeners = NULL;
+ 
+     /* Global virtual host hash bucket pointers.  Init to null. */
+     memset( virt_hashed, 0, VIRT_HASH_SIZE * sizeof( server_rec * ) );
  }
  
  server_rec *init_server_config(pool *p)
diff -c3 ./http_config.h ../../apache_1.2.1.dwh/src/http_config.h
*** ./http_config.h	Thu Jun 26 22:07:34 1997
--- ../../apache_1.2.1.dwh/src/http_config.h	Tue Jul 15 22:15:42 1997
***************
*** 274,279 ****
--- 274,280 ----
  
  /* For http_core.c... (<Directory> command and virtual hosts) */
  
+ char virthash( unsigned long key );
  int parse_htaccess(void **result, request_rec *r, int override,
  		   char *path, char *file);
  const char *srm_command_loop (cmd_parms *parms, void *config);
diff -c3 ./http_core.c ../../apache_1.2.1.dwh/src/http_core.c
*** ./http_core.c	Sat Jul  5 13:56:49 1997
--- ../../apache_1.2.1.dwh/src/http_core.c	Thu Jul 17 06:49:35 1997
***************
*** 795,802 ****
  const char *virtualhost_section (cmd_parms *cmd, void *dummy, char *arg)
  {
      server_rec *main_server = cmd->server, *s;
      const char *errmsg;
!     char *endp = strrchr (arg, '>');
      pool *p = cmd->pool, *ptemp = cmd->temp_pool;
  
      if (endp) *endp = '\0';
--- 795,804 ----
  const char *virtualhost_section (cmd_parms *cmd, void *dummy, char *arg)
  {
      server_rec *main_server = cmd->server, *s;
+     struct hcf *hashme;
+     server_addr_rec *listp;
      const char *errmsg;
!     char *endp = strrchr (arg, '>'), buk;
      pool *p = cmd->pool, *ptemp = cmd->temp_pool;
  
      if (endp) *endp = '\0';
***************
*** 812,818 ****
      s = init_virtual_host (p, arg, main_server);
      s->next = main_server->next;
      main_server->next = s;
! 	
      cmd->server = s;
      errmsg = srm_command_loop (cmd, s->lookup_defaults);
      cmd->server = main_server;
--- 814,841 ----
      s = init_virtual_host (p, arg, main_server);
      s->next = main_server->next;
      main_server->next = s;
! 
!     /* The server rec is fleshed out, now we create its hash entr(y|ies). */
!     listp = s->addrs; /* Hash the whole list of addresses. */
!     while( listp )    /* If any. */
!     {
!       hashme = pcalloc( p, sizeof( struct hcf ) );
!       hashme->entry = s;
! 
!       buk = virthash( listp->host_addr.s_addr );
! 
!       /* Just add it onto the head.  Shouldn't matter.  The ambitious hacker
!        * would frob with the lookup function in find_virtual_host to bubble
!        * frequently used addresses to the first entry in the hash bucket.
!        *
!        * I don't usually like to do bubbling without a dually linked list
!        * though...and that would add memory overhead.
!        */
!       hashme->next = virt_hashed[buk];
!       virt_hashed[buk] = hashme;
!       listp = listp->next;
!     }
! 
      cmd->server = s;
      errmsg = srm_command_loop (cmd, s->lookup_defaults);
      cmd->server = main_server;
diff -c3 ./http_main.c ../../apache_1.2.1.dwh/src/http_main.c
*** ./http_main.c	Fri Jul 18 00:49:44 1997
--- ../../apache_1.2.1.dwh/src/http_main.c	Fri Jul 18 00:49:23 1997
***************
*** 149,154 ****
--- 149,203 ----
  time_t restart_time;
  int suexec_enabled = 0;
  
+ /* A (n) bucket hash table, each entry has a pointer to a server rec and
+  * a pointer to the other entries in that bucket.  Each individual address,
+  * even for virtualhosts with multiple addresses, has an entry in this hash
+  * table.
+  *
+  * The #def I set in http_conf_globals.h was 256.  That's 1024 bytes nominal
+  * for one 32 bit addresses.  You'll note it's a bin of pointers.  If you
+  * actually allocate something over those pointers it's 8 bytes * entries.
+  *
+  * So, nominal plus a 512 address load would be 5120 bytes.  Still small.
+  * 
+  * This table does not change after config files are parsed.  This means that
+  * forked children will have this table sit in copy-on-write-pages and not
+  * use more RAM.
+  */
+ struct hcf *virt_hashed[VIRT_HASH_SIZE];
+ 
+ /* This is your standard hashing function.  Just xors half of the value
+  * repeatedly until we've reached our value granularity.  If you were to
+  * go to a 16 entry table, you'd want to add another line to go down the
+  * next four bits.  If you were to go to a 65536 entry table, you'll need
+  * to re-typecast the function and return the first line below.  Don't
+  * forget in both these cases to change the constant (0xff) on the below
+  * return.  Yes, in it's current iteration (256) entries, this is redundant.
+  * I left it in there so things WOULD WORK even if people changed sizes and
+  * re typecasted it, but forgot to change the AND.  I'd hate to see someone
+  * re cast virthash as returning a __u32 and getting hash entries off in
+  * extremely unfriendly areas of memory.  At least it will work this way.
+  *
+  * Hm.  Maybe it shouldn't.
+  *
+  * Anyways, if you are keeping the same hash size, remove it.
+  *
+  * The reason I chose this particular method is because it gets very good
+  * distribution in a case where only one byte changes from address to
+  * address.  This is true, at least, of my installation, where we put entire
+  * blocks of addresses in continguous /24's onto machines.
+  *
+  * How many other people do you know who write comments longer than their
+  * code?
+  *
+  * David Hankins, 07/16/97 0545.09 EDT
+  */
+ char virthash( unsigned long key )
+ {
+   key ^= (key >> 16);
+   return( ((key >> 8) ^ key) & 0xff );
+ }
+ 
  char server_root[MAX_STRING_LEN];
  char server_confname[MAX_STRING_LEN];
  
***************
*** 1470,1475 ****
--- 1519,1583 ----
      server_rec *virt;
      server_addr_rec *sar;
      server_rec *def;
+     struct hcf *trav;
+     char buk;
+ 
+     /* Now we do a hash lookup, if it matches the rule below it's ok,
+      * otherwise we default to the full linear search.
+      *
+      * It's not particularly efficient not to just select a default and
+      * backout...but I did this because I'm not sure how selecting a
+      * default is really done (nor where it should be done).  If it's not
+      * a virtualhost, then it's just...NOT...right?  Or wrong.  I don't
+      * know is the point.  Once someone defines what should be done if the
+      * virtual host is not found, we can add that bit in below where my
+      * if-statement terminates and cut out the old linear search entirely.
+      *
+      * Essentially, people who know (people who wrote the original linear
+      * list) should define what my hash table doesn't cover and handle
+      * only that set.  If that makes any sense.
+      *
+      * Currently, the only thing *I* know about that my table doesn't find
+      * is the server's main address.  If this is true, then fixing this
+      * function and removing the linear list entirely is trivial.
+      */
+     buk = virthash( server_ip.s_addr );
+     if( trav = virt_hashed[buk] )
+     {
+       do
+       {
+ 	/* The match either isn't a match, or could be a match on any one of
+ 	 * that virtualhosts' addresses.  So we search them all linearly.  A
+ 	 * host shouldn't have more than 4 or so addresses, that I can think
+ 	 * of, so this isn't such a big deal.  Usually, this for loop gets
+ 	 * one iteration.
+ 	 */
+ 	virt = trav->entry;
+ 	for( sar = virt->addrs ; sar ; sar = sar->next )
+ 	{
+ 	  /* I removed the INADDRY_ANY check here...it shouldn't be in hash.
+ 	   * I think that was intended for the "default" case, but as my
+ 	   * comments allude, I'm just not sure.  This works for our case of
+ 	   * virtual hosts at any rate.  One could merely return the default
+ 	   * after this check, if my guesses are right.  I'm not banking on
+ 	   * my guesses being right.
+ 	   *
+ 	   * Actually, this loop could be avoided if you put a second
+ 	   * record in the hash list entry to refer to the s_addr in
+ 	   * question...
+ 	   *
+ 	   * Also, I think the is_virtual check is redundant.
+ 	   */
+ 	  if((virt->is_virtual == 1) &&
+ 		(sar->host_addr.s_addr == server_ip.s_addr) &&
+ 		(sar->host_port == 0 || sar->host_port == port))
+ 	  {
+ 	    return virt;
+ 	  }
+ 	}
+       }
+       while( trav = trav->next );
+     }
  
      def = server;
      for (virt = server->next; virt; virt = virt->next) {
diff -c3 ./httpd.h ../../apache_1.2.1.dwh/src/httpd.h
*** ./httpd.h	Sat Jul  5 22:04:22 1997
--- ../../apache_1.2.1.dwh/src/httpd.h	Wed Jul 16 05:16:09 1997
***************
*** 231,236 ****
--- 231,246 ----
  #define HARD_SERVER_LIMIT 256
  #endif
  
+ /* Couldn't find a better place for this.  If you make this larger than
+  * 256, you will have to go and update the hash function and also
+  * whatever code speaks to it.  Basically, don't touch this unless you know
+  * what you're doing.  You'll notice for example that it is not a prime
+  * number.
+  */
+ #ifndef VIRT_HASH_SIZE
+ #define VIRT_HASH_SIZE 256
+ #endif
+ 
  /* Number of requests to try to handle in a single process.  If <= 0,
   * the children don't die off.  That's the default here, since I'm still
   * interested in finding and stanching leaks.
***************
*** 408,418 ****
      const struct htaccess_result *next;
  };
  
- 
  typedef struct conn_rec conn_rec;
  typedef struct server_rec server_rec;
  typedef struct request_rec request_rec;
  typedef struct listen_rec listen_rec;
  
  struct request_rec {
  
--- 418,433 ----
      const struct htaccess_result *next;
  };
  
  typedef struct conn_rec conn_rec;
  typedef struct server_rec server_rec;
  typedef struct request_rec request_rec;
  typedef struct listen_rec listen_rec;
+ 
+ struct hcf /* Meta linear list for hashes.  Easier than creating duplicate */
+ {	   /* server_recs.						   */
+   struct server_rec *entry;
+   struct hcf *next;
+ };
  
  struct request_rec {
  
--- Patch finished ---

-- 
David Hankins,             "If you don't do it right the first time,
Mercenary				    you'll just have to do it again."
							-- J.T. Hankins