Received: (from majordom@localhost) by hyperreal.org (8.8.5/8.8.5) id DAA00371; Fri, 18 Jul 1997 03:35:43 -0700 (PDT) Received: from pillar.elsevier.co.uk (root@pillar.elsevier.co.uk [193.131.222.35]) by hyperreal.org (8.8.5/8.8.5) with ESMTP id DAA00365 for ; Fri, 18 Jul 1997 03:35:38 -0700 (PDT) Received: from snowdon.elsevier.co.uk (snowdon.elsevier.co.uk [193.131.197.164]) by pillar.elsevier.co.uk (8.8.5/8.8.5) with ESMTP id LAA08358 for ; Fri, 18 Jul 1997 11:31:33 +0100 (BST) Received: from cadair.elsevier.co.uk by snowdon.elsevier.co.uk with SMTP (PP); Fri, 18 Jul 1997 11:38:19 +0100 Received: from windrush.elsevier.co.uk (windrush.elsevier.co.uk [193.131.192.83]) by cadair.elsevier.co.uk (8.8.5/8.8.5) with ESMTP id LAA02747 for ; Fri, 18 Jul 1997 11:38:14 +0100 (BST) From: Andrew Wilson Received: (from awilson@localhost) by windrush.elsevier.co.uk (8.8.5/8.8.5) id LAA01018 for new-httpd@apache.org; Fri, 18 Jul 1997 11:36:49 +0100 (BST) Message-Id: <199707181036.LAA01018@windrush.elsevier.co.uk> Subject: Apache VirtualHost Hashing vs Linear-Listing (fwd) To: new-httpd@apache.org Date: Fri, 18 Jul 1997 11:36:48 +0100 (BST) X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: new-httpd-owner@apache.org Precedence: bulk Reply-To: new-httpd@apache.org Hi, Dave Hankins asked me to forward this on. We last heard from Dave when he posted an improved version of mod_dld several months ago. This patch relates to changes made to an existing multi-box site in order to support > 512 hosts per box efficiently. It's a long message, but worth the read. Those of you with big vhost sites may even find it useful. Cheers, Ay. --- I have completed a hashing VirtualHost implementation for Apache v1.2.1. For those on the Apache development team who remember me and are curious, no, I have not completed my port of ELF libdl support from Apache 1.1 to Apache 1.2. It is no longer of import to me. The Problem: Apache 1.2.1 stores all server information in server_rec's in memory. These server_recs are kept in a linear list. For every connection, the server must traverse the linear list until it finds a server_rec that has a matching ip address on its server_ip list. This becomes a problem should the webmaster have the urge to have a large number of virtual web sites on the same machine. As it stands, Apache is efficient enough that it will handle large numbers of virtual hosts (depending on the volume those hosts push out of course) quite readily. I have taken this development direction only because of the report (however vague) from friends in the industry that have indicated more than 508 would "perform poorly." The indication was that the number of virtual hosts was the problem, not the accumulated load of those hosts. The Solution: A 256 bucket hash table using a (quick) 32->16->8 bit XOR. This table is fleshed out on every server configuration command and allocates the list entries for the buckets out of the pconf memory pool. In this manner there are no memory leaks and the table is not corrupt on HUPs adding virtualhost directives. This would reduce the worst-case efficiency to (n/256) from (n). Benchmarks: I started on an external benchmark but realized many of the (changing) variables involved in a web server's function are not easily controlled. As such, the benchmark is driven out of apache directly as follows: --- /* BENCH */ gettimeofday( &start, NULL ); conn->server = find_virtual_server(saddr->sin_addr, ntohs(saddr->sin_port), server); /* BENCH */ gettimeofday( &stop, NULL ); printf( "%u %u %u %u\n", start.tv_sec, start.tv_usec, stop.tv_sec, stop.tv_usec ); --- These lines are added both to a stock 1.2.1 Apache and a "1.2.1.dwh" Apache. The default list of modules was used. The same compiler and CFLAGS were used. The server was configured with 1271 addresses (five class C's plus my test address) and both servers used the exact same configuration files for the benchmarks. -- The test address was listed first in the configuration file. This is important to note as the current insertion algorithms for both the linear and hashed Apaches make this test address the WORST case possible. This was done for no other reason than because it's easier to append new config lines onto the end of the conf file. A small application I had originally written for an external evaluation was used to create web traffic (hits with 10,000 microsecond "rest" periods). The "rest" periods are intended to keep system load down and availability high. I did not want to see my timings become artificially increased because of context switches. The Results: nop:/tmp/hankstone$ wc hash.frob linear.frob 19280 77126 651264 hash.frob 19644 78583 663552 linear.frob 38924 155709 1314816 total nop:/tmp/hankstone$ gawk 'BEGIN { maxi = 0 ; mini = 1000000 ; numstats = 0 ; stat = 0 } /^[0-9]* [0-9]* [0-9]* [0-9]*$/ { if ( ($4 > $2) && ($1 == $3) ) { numstats++ ; val = ($4 - $2) ; if( val > 1000000 ) { next } ; stat += val ; if( mini > val ) { mini = val } if( maxi < val ) { maxi = val } } } END { print "["mini"/" stat / numstats "/"maxi"] [min/avg/max] microseconds" }' hash.frob [20/22.1534/70] [min/avg/max] microseconds nop:/tmp/hankstone$ gawk 'BEGIN { maxi = 0 ; mini = 1000000 ; numstats = 0 ; stat = 0 } /^[0-9]* [0-9]* [0-9]* [0-9]*$/ { if ( ($4 > $2) && ($1 == $3) ) { numstats++ ; val = ($4 - $2) ; if( val > 1000000 ) { next } ; stat += val ; if( mini > val ) { mini = val } if( maxi < val ) { maxi = val } } } END { print "["mini"/" stat / numstats "/"maxi"] [min/avg/max] microseconds" }' linear.frob [1478/1489.7/68262] [min/avg/max] microseconds The cutoff for microsecond values larger then 1000000 was added as the captured files would occaisionally list the last hit with only part of the first number reported. You will also note that, to avoid mathematical error, I have completely cut out any conditions where the microsecond counter has rolled over and the seconds counter has incremented. I verified that this did not cut out more than the last record from the collected set of data. --- Discussion of Benchmark results. >From the linear version's minimum nominal to the hashed version's MAXIMUM recorded nominal, the studious reader can see that the hashed version is 2111.43% better. From average-to-average, 6724.2%. This is also a profile of that algorithm alone. But, 1478 microseconds corresponds to 1.478 milliseconds, roughly double the nominal round trip time for ICMP echos across 100mbit full duplex ethernet. This means a packet has been transmitted, in one direction or another, and all one after the other not at the same time, four times in 0.7 ms. In 1.4 ms, it would be eight times. Assuming the average .html file is no larger than 6000 bytes, this is the same amount of time it would take the webserver to start and complete the http return transaction (from its point of view, the network latency increases this of course). The packets would be transmitted and sitting in buffer somewhere. Even if the average is not below this number, it is still a good indication of scale for this number. I am neglecting, of course, the other processing Apache must do involving the http transaction. It is my estimation that one would not perceive any form of connection construction latency degradation on the hashed version of Apache until 131,072 addresses were configured. I certainly would not recommend approaching that limit without reengineering the hashing table. Work Remaining to be done: The find_virtual_server function in my implementation attempts a hashed lookup on the address and then traverses a linear list of the hashed server's addresses to check for a match. This inner linear list can be avoided by preloading the "sar" address into the list entry. This would increase the hash memory overhead by 4 bytes per address to a total of 12 per address + 1024 (base 256 pointers array, the table itself), but that should be acceptable. The function also currently *REVERTS* to the old painful linear list should it not find a match in the hash table. This happens for the "main" server address (anything that is not a VirtualHost entry). The entire set of times this happens is not in my knowledge. But I know it happens for the server's main address. This would make the patch currently unusable should anyone decide to combine virtual and "core" web functionality onto the same machines. I ask the Apache development team to make a determination on the set of "non virtual" cases that must be considered after the hash table and from there, how to most efficiently handle those cases. It is because I did not know the answer to this question that I left it as it is, reverting to the linear list. Someone also needs to do the necessary "multiple hash size" work and make options available from the Configuration file. Currently, things would be messy if you were to merely up the "VIRT_HASH_SIZE" #def. You need to go back and revisit the hashing function, which is the one and only way a bucket is referred to in the hash table, in order to ensure that Apache would use the entire hash table. The function itself, as a limitation of the algorithm I selected, is not prepared for multiple hash sizes and will need to be edited should you decide to change its size. I tend to think that 256 buckets is "plenty" (I selected a 'char' as a return value with purpose) and, if one were to make any changes whatsoever, one should strive to make it accept smaller hash sizes, on the order of 4 bit or 2 bit (16 or 4 buckets). To do this, one would need merely to collapse the hash function's xor carrier another one or two iterations, respectively. I would recommend a 4 bit hash as "standard", with the Configuration hooks to move to either 256 or 2, optimizing for "virtualserver" (defined as a server using around 256 addresses) and "singleserver" configurations. If I knew the group's accepted form factor for configuration, and the inner workings of Apache Configure, I would have done this already. Someone from the Apache development group should also ensure the code is written in acceptable style (I appear to be the only freak who has hacked on Apache that writes more comments than he does actual code) and that functions, globals, and #defs are in the appropriate places. I was driven only by my own sense of style and purpose, not by any collaboritive group's sense. For example, I tried to keep those things global which I added all defined within four lines of each other. You should pick a better name for the hash bin structures other than "HCF." I couldn't think of anything else offhand. I almost called them "ExeBuckets" but thought that would be too pornographic for Apache code distribution. You should find 'struct hcf' littered throughout my code. Requests: Consider this a formal request on the part of one David Hankins that this patch, or an iteration very similar in function to this patch, be introduced into Apache mainline distribution, preferrably starting with the current beta. I ask only that I and whoever "cleans up" the introduction of the patch to Apache receive (commented) credit for both work and ideas present in this document. Should you choose to include this in future distributions of Apache. "David Hankins" for name and "Independant Mercenary" for any organizational credits will suffice. I need the code below tested, widely. Should you choose to put it in mainstream, please get it in your current beta and thrown around a whole lot. From start to finish it took me thirty minutes and there were no compilation errors and no problems when it ran. That means there must be something horribly wrong that I can't see, and it needs testing more than, say, code that has errored and been fixed a few times. Call it a gut feeling, the thing has to have bugs. The Patch: As I do not currently have any better means, I have included the patch below. Don't whine, it's really not that large. Only 282 lines, 10kbytes. Thank you for your time. --- Start of patch --- diff -c3 ./http_conf_globals.h ../../apache_1.2.1.dwh/src/http_conf_globals.h *** ./http_conf_globals.h Sun Jun 29 14:08:35 1997 --- ../../apache_1.2.1.dwh/src/http_conf_globals.h Tue Jul 15 22:29:11 1997 *************** *** 84,86 **** --- 84,88 ---- extern char server_root[MAX_STRING_LEN]; extern char server_confname[MAX_STRING_LEN]; + extern struct hcf *virt_hashed[VIRT_HASH_SIZE]; + diff -c3 ./http_config.c ../../apache_1.2.1.dwh/src/http_config.c *** ./http_config.c Sun Jun 29 14:08:36 1997 --- ../../apache_1.2.1.dwh/src/http_config.c Wed Jul 16 05:12:48 1997 *************** *** 1044,1049 **** --- 1044,1052 ---- max_requests_per_child = DEFAULT_MAX_REQUESTS_PER_CHILD; bind_address.s_addr = htonl(INADDR_ANY); listeners = NULL; + + /* Global virtual host hash bucket pointers. Init to null. */ + memset( virt_hashed, 0, VIRT_HASH_SIZE * sizeof( server_rec * ) ); } server_rec *init_server_config(pool *p) diff -c3 ./http_config.h ../../apache_1.2.1.dwh/src/http_config.h *** ./http_config.h Thu Jun 26 22:07:34 1997 --- ../../apache_1.2.1.dwh/src/http_config.h Tue Jul 15 22:15:42 1997 *************** *** 274,279 **** --- 274,280 ---- /* For http_core.c... ( command and virtual hosts) */ + char virthash( unsigned long key ); int parse_htaccess(void **result, request_rec *r, int override, char *path, char *file); const char *srm_command_loop (cmd_parms *parms, void *config); diff -c3 ./http_core.c ../../apache_1.2.1.dwh/src/http_core.c *** ./http_core.c Sat Jul 5 13:56:49 1997 --- ../../apache_1.2.1.dwh/src/http_core.c Thu Jul 17 06:49:35 1997 *************** *** 795,802 **** const char *virtualhost_section (cmd_parms *cmd, void *dummy, char *arg) { server_rec *main_server = cmd->server, *s; const char *errmsg; ! char *endp = strrchr (arg, '>'); pool *p = cmd->pool, *ptemp = cmd->temp_pool; if (endp) *endp = '\0'; --- 795,804 ---- const char *virtualhost_section (cmd_parms *cmd, void *dummy, char *arg) { server_rec *main_server = cmd->server, *s; + struct hcf *hashme; + server_addr_rec *listp; const char *errmsg; ! char *endp = strrchr (arg, '>'), buk; pool *p = cmd->pool, *ptemp = cmd->temp_pool; if (endp) *endp = '\0'; *************** *** 812,818 **** s = init_virtual_host (p, arg, main_server); s->next = main_server->next; main_server->next = s; ! cmd->server = s; errmsg = srm_command_loop (cmd, s->lookup_defaults); cmd->server = main_server; --- 814,841 ---- s = init_virtual_host (p, arg, main_server); s->next = main_server->next; main_server->next = s; ! ! /* The server rec is fleshed out, now we create its hash entr(y|ies). */ ! listp = s->addrs; /* Hash the whole list of addresses. */ ! while( listp ) /* If any. */ ! { ! hashme = pcalloc( p, sizeof( struct hcf ) ); ! hashme->entry = s; ! ! buk = virthash( listp->host_addr.s_addr ); ! ! /* Just add it onto the head. Shouldn't matter. The ambitious hacker ! * would frob with the lookup function in find_virtual_host to bubble ! * frequently used addresses to the first entry in the hash bucket. ! * ! * I don't usually like to do bubbling without a dually linked list ! * though...and that would add memory overhead. ! */ ! hashme->next = virt_hashed[buk]; ! virt_hashed[buk] = hashme; ! listp = listp->next; ! } ! cmd->server = s; errmsg = srm_command_loop (cmd, s->lookup_defaults); cmd->server = main_server; diff -c3 ./http_main.c ../../apache_1.2.1.dwh/src/http_main.c *** ./http_main.c Fri Jul 18 00:49:44 1997 --- ../../apache_1.2.1.dwh/src/http_main.c Fri Jul 18 00:49:23 1997 *************** *** 149,154 **** --- 149,203 ---- time_t restart_time; int suexec_enabled = 0; + /* A (n) bucket hash table, each entry has a pointer to a server rec and + * a pointer to the other entries in that bucket. Each individual address, + * even for virtualhosts with multiple addresses, has an entry in this hash + * table. + * + * The #def I set in http_conf_globals.h was 256. That's 1024 bytes nominal + * for one 32 bit addresses. You'll note it's a bin of pointers. If you + * actually allocate something over those pointers it's 8 bytes * entries. + * + * So, nominal plus a 512 address load would be 5120 bytes. Still small. + * + * This table does not change after config files are parsed. This means that + * forked children will have this table sit in copy-on-write-pages and not + * use more RAM. + */ + struct hcf *virt_hashed[VIRT_HASH_SIZE]; + + /* This is your standard hashing function. Just xors half of the value + * repeatedly until we've reached our value granularity. If you were to + * go to a 16 entry table, you'd want to add another line to go down the + * next four bits. If you were to go to a 65536 entry table, you'll need + * to re-typecast the function and return the first line below. Don't + * forget in both these cases to change the constant (0xff) on the below + * return. Yes, in it's current iteration (256) entries, this is redundant. + * I left it in there so things WOULD WORK even if people changed sizes and + * re typecasted it, but forgot to change the AND. I'd hate to see someone + * re cast virthash as returning a __u32 and getting hash entries off in + * extremely unfriendly areas of memory. At least it will work this way. + * + * Hm. Maybe it shouldn't. + * + * Anyways, if you are keeping the same hash size, remove it. + * + * The reason I chose this particular method is because it gets very good + * distribution in a case where only one byte changes from address to + * address. This is true, at least, of my installation, where we put entire + * blocks of addresses in continguous /24's onto machines. + * + * How many other people do you know who write comments longer than their + * code? + * + * David Hankins, 07/16/97 0545.09 EDT + */ + char virthash( unsigned long key ) + { + key ^= (key >> 16); + return( ((key >> 8) ^ key) & 0xff ); + } + char server_root[MAX_STRING_LEN]; char server_confname[MAX_STRING_LEN]; *************** *** 1470,1475 **** --- 1519,1583 ---- server_rec *virt; server_addr_rec *sar; server_rec *def; + struct hcf *trav; + char buk; + + /* Now we do a hash lookup, if it matches the rule below it's ok, + * otherwise we default to the full linear search. + * + * It's not particularly efficient not to just select a default and + * backout...but I did this because I'm not sure how selecting a + * default is really done (nor where it should be done). If it's not + * a virtualhost, then it's just...NOT...right? Or wrong. I don't + * know is the point. Once someone defines what should be done if the + * virtual host is not found, we can add that bit in below where my + * if-statement terminates and cut out the old linear search entirely. + * + * Essentially, people who know (people who wrote the original linear + * list) should define what my hash table doesn't cover and handle + * only that set. If that makes any sense. + * + * Currently, the only thing *I* know about that my table doesn't find + * is the server's main address. If this is true, then fixing this + * function and removing the linear list entirely is trivial. + */ + buk = virthash( server_ip.s_addr ); + if( trav = virt_hashed[buk] ) + { + do + { + /* The match either isn't a match, or could be a match on any one of + * that virtualhosts' addresses. So we search them all linearly. A + * host shouldn't have more than 4 or so addresses, that I can think + * of, so this isn't such a big deal. Usually, this for loop gets + * one iteration. + */ + virt = trav->entry; + for( sar = virt->addrs ; sar ; sar = sar->next ) + { + /* I removed the INADDRY_ANY check here...it shouldn't be in hash. + * I think that was intended for the "default" case, but as my + * comments allude, I'm just not sure. This works for our case of + * virtual hosts at any rate. One could merely return the default + * after this check, if my guesses are right. I'm not banking on + * my guesses being right. + * + * Actually, this loop could be avoided if you put a second + * record in the hash list entry to refer to the s_addr in + * question... + * + * Also, I think the is_virtual check is redundant. + */ + if((virt->is_virtual == 1) && + (sar->host_addr.s_addr == server_ip.s_addr) && + (sar->host_port == 0 || sar->host_port == port)) + { + return virt; + } + } + } + while( trav = trav->next ); + } def = server; for (virt = server->next; virt; virt = virt->next) { diff -c3 ./httpd.h ../../apache_1.2.1.dwh/src/httpd.h *** ./httpd.h Sat Jul 5 22:04:22 1997 --- ../../apache_1.2.1.dwh/src/httpd.h Wed Jul 16 05:16:09 1997 *************** *** 231,236 **** --- 231,246 ---- #define HARD_SERVER_LIMIT 256 #endif + /* Couldn't find a better place for this. If you make this larger than + * 256, you will have to go and update the hash function and also + * whatever code speaks to it. Basically, don't touch this unless you know + * what you're doing. You'll notice for example that it is not a prime + * number. + */ + #ifndef VIRT_HASH_SIZE + #define VIRT_HASH_SIZE 256 + #endif + /* Number of requests to try to handle in a single process. If <= 0, * the children don't die off. That's the default here, since I'm still * interested in finding and stanching leaks. *************** *** 408,418 **** const struct htaccess_result *next; }; - typedef struct conn_rec conn_rec; typedef struct server_rec server_rec; typedef struct request_rec request_rec; typedef struct listen_rec listen_rec; struct request_rec { --- 418,433 ---- const struct htaccess_result *next; }; typedef struct conn_rec conn_rec; typedef struct server_rec server_rec; typedef struct request_rec request_rec; typedef struct listen_rec listen_rec; + + struct hcf /* Meta linear list for hashes. Easier than creating duplicate */ + { /* server_recs. */ + struct server_rec *entry; + struct hcf *next; + }; struct request_rec { --- Patch finished --- -- David Hankins, "If you don't do it right the first time, Mercenary you'll just have to do it again." -- J.T. Hankins