httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Heidemann <jo...@ISI.EDU>
Subject mmap patch for Apache-1.0.5
Date Fri, 24 May 1996 18:01:41 GMT

At ISI I'm looking at web server performance as part of the LSAM
project (http://www.isi.edu/div7/lsam/).  As part of this analysis we
found an optimization to Apache performance:  by using memory-mapped
files (rather than stdio), CPU utilization can be reduced when sending
large files.

The attached patch implements this optimization in Apache-1.0.5.
Performance is examined in more detail in the long comment at the
beginning of the patch.

Although the patch is for Apache-1.0.5, the port to 1.1bX should be
fairly easy.  If people think that the patch is suitable for inclusion
in a future release of Apache (probably 1.2), then I will do the port.

Comments?

   -John Heidemann
    USC/ISI

----------------------------------------------------------------------
Index: http_protocol.c
===================================================================
RCS file: /nfs/gost/CVSroot/external/apache/src/http_protocol.c,v
retrieving revision 1.1
retrieving revision 1.4
diff -u -u -r1.1 -r1.4
--- http_protocol.c	1996/04/04 17:56:44	1.1
+++ http_protocol.c	1996/05/23 22:58:13	1.4
@@ -532,13 +532,260 @@
     return fread (buffer, sizeof(char), bufsiz, r->connection->request_in);
 }
 
+#ifdef ISI_MMAP
+/***********************************************************************
+ *
+ * ISI_MMAP patch
+ * --------------
+ * John Heidemann, <johnh@isi.edu>
+ *
+ *
+ * Apache 1.0.5 (and NCSA 1.5) use stdio to send out file data.
+ * Stdio is good for piecing together headers, but it's not
+ * the best choice for bulk-data transfer because it incurs
+ * several unnecessary data copies.
+ *
+ * With stdio you see the following copies to send out a file:
+ *     disk -> fs/vm-cache -> stdio buffer -> user buffer
+ *	    -> stdio buffer -> mbufs -> network device
+ * (6 copies)
+ *
+ * Instead of using stdio, instead we should memory map the file
+ * and then write that memory directly out to the network.
+ * Mmap/write eliminates the stdio buffer copies:
+ *     disk -> fs/vm-cache -> mbufs -> network device
+ * (3 copies)
+ * With mmap, the data never hits user-space.
+ *
+ *
+ * What is the result of mmaping instead of stdio?
+ * -----------------------------------------------
+ *
+ * In cases where your web server is CPU-bound and mmaping is
+ * effective, you should see better performance with mmapping.  In
+ * cases where your web server is not CPU bound, you should see a
+ * lower CPU utilization.
+ *
+ * Mmapping is only effective for ``large'' files; for extremely small
+ * files the cost of setting up the mmap exceeds the cost of simply
+ * doing the extra data copies.  In this case ``large'' is an
+ * OS- and hardware- dependent value; for SunOS 4.1.3 on Sparc-10s
+ * the balance seems to be at about 10k.
+ *
+ * When are web servers CPU bound?  A Sparc-10 can saturate a 10Mb/sec
+ * Ethernet with CPU to spare.  With Myrinet (a 640Mb/sec network, see
+ * http://www.myri.com), CPU usage becomes an issue.  With Sparc-10s,
+ * we found that mmaping allows ~2Mb/sec better performance than stdio
+ * for files larger than 25KB (maximum throughput is 18Mb/sec for 10MB
+ * files).  For Sparc-20/71s we see about the same performance gain
+ * (maximum throughput is ~39Mb/sec for 10MB files).  (These
+ * measurements are between two unloaded machines with the same CPU
+ * type connected through a single Myrinet switch.  A modified Apache
+ * server ran on one machine, and a single client ran on the other
+ * machine, requesting the same file 50 times in a row.  Files were
+ * stored in tmpfs on the server.)
+ *
+ * Servers are also CPU bound when there are many clients hitting a
+ * single server.  We ran WebStone (with the ``Silicon Surf''
+ * filelist) with and without the mmap patch on a Sparc-20/71 server
+ * with two Sparc-10 clients over Myrinet.  The mmap-enhanced
+ * server handles about 0.5-1.5 additional connections per second
+ * as the number of clients varies from 2 to 24.  (The total
+ * number of connections per second ranges from 14.9 to 35.4.)
+ *
+ *
+ * About the implementation
+ * ------------------------
+ *
+ * The implementation had several goals:
+ *   - minimal changes
+ *   - make the new code look like the old code
+ *   - check all errors
+ *   - fall back on stdio at the slightest problem, if we can
+ * In general, write something that people will run in a production
+ * web server.
+ *
+ * There is one possible resource leak:  mmap segmenets must be released
+ * upon aborts.  I check all error returns, but it looks like
+ * timer-expirations lead to longjmps.  To get around this problem,
+ * we probably should add the mmap segment to the resource 
+ * pool cleanups.
+ *
+ *
+ * How to use
+ * ----------
+ *
+ * To use this implementation, apply the patch to Apache 1.0.*,
+ * add -DISI_MMAP to AUX_CFLAGS in the Makefile or Configuration,
+ * and re-build.
+ *
+ *
+ * Disclaimer
+ * ----------
+ *
+ * DISCLAIMER OF WARRANTY.  THIS PATCH IS PROVIDED "AS IS".  The
+ * University of Southern California MAKES NO REPRESENTATIONS OR
+ * WARRANTIES, EXPRESS OR IMPLIED.  By way of example, but not
+ * limitation, the University of Southern California MAKES NO
+ * REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
+ * PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE
+ * COMPONENTS OR DOCUMENTATION WILL NOT INFRINGE ANY PATENTS,
+ * COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.  The University of Southern
+ * California shall not be held liable for any liability nor for any
+ * direct, indirect, or consequential damages with respect to any
+ * claim by the user or distributor of this patch or any
+ * third party on account of or arising from this Agreement or the use
+ * or distribution of this patch.
+ *
+ */
+
+
+#include <sys/types.h>
+#include <sys/mman.h>
+/* work around deficient system headers (ex. SunOS 4.1.3) */
+#ifndef MAP_FILE
+#define MAP_FILE 0
+#endif /* ! MAP_FILE */
+
+/*
+ * On SunOS 4.1.3, the performance tradeoff 
+ * between mmap and stdio
+ * (as measured by bandwidth over Myrinet between Sparc-10 hosts)
+ * seems to strike at ~10000B.
+ * Your mileage may vary.
+ */
+#define MMAP_THRESHOLD (8*1024)
+#define MMAP_SEGMENT_SIZE (8*1024*1024)
+/*
+ * Currently we write data in 32KB chunks,
+ * 4x more than with fread/fwrite.
+ * Larger chunks => fewer system calls => lower CPU utilization.
+ * ...*but* we have a timer going and we don't want the timer
+ * to expire before we're through (or we'll be sorry).
+ */
+#define MMAP_WRITE_SIZE (32*1024)
+#define MMAP_AGAIN -2
+
+/*
+ * To avoid data copies,
+ * send_fd_mmap uses mmap/write instead of stdio.
+ *
+ * Another interface difference:
+ * send_fd doesn't necessarily leave either the file passed in (f),
+ * or r->connection->client in a usable state.
+ * See the comment at the end for details.
+ *
+ * - John Heidemann, <johnh@isi.edu>, 960411
+ */
+long send_fd_mmap(FILE *f, request_rec *r)
+{
+    int r_fd, w_fd, start_ftell;
+    caddr_t map;
+    size_t remaining_length, segment_length;
+    off_t segment_start;
+    int total_bytes_sent = 0;
+    int w, n, o;
+    conn_rec *c = r->connection;
+
+    /* First, clean up file. */
+    fflush(f);
+    start_ftell = (off_t)ftell(f);
+    r_fd = fileno(f);
+
+    fflush(c->client);
+    w_fd = fileno(c->client);
+
+    /* set up for initial mapping */
+    segment_start = start_ftell & ~(MMAP_SEGMENT_SIZE-1);
+    o = start_ftell & (MMAP_SEGMENT_SIZE-1);
+    remaining_length = r->finfo.st_size - start_ftell;
+
+    while (!c->aborted && remaining_length) {
+	segment_length = MMAP_SEGMENT_SIZE;
+	if (segment_length > remaining_length)
+	    segment_length = remaining_length;
+	if (segment_length == 0)
+	    break;
+
+	map = mmap(NULL, (size_t)segment_length, PROT_READ, MAP_SHARED|MAP_FILE,
+		   r_fd, (off_t)segment_start);
+	/*
+	 * If mmap failed and we haven't done anythign else yet,
+	 * fall back on stdio by returning MMAP_AGAIN.
+	 * send_fd recognizes this message and picks up.
+	 */
+	if (map == (caddr_t) -1)
+	    return total_bytes_sent ? total_bytes_sent : MMAP_AGAIN;
+	n = segment_length - o;  /* bytes to send */
+
+	/*
+	 * xxx: we write in larger chunks than send_fd,
+	 * possibly therefore requiring larger timeout values.
+	 */
+	while (n && !c->aborted) {
+	    w = MMAP_WRITE_SIZE;
+	    if (n < MMAP_WRITE_SIZE)
+		w = n;
+	    w = write(w_fd, &map[o], w);
+	    if (w == -1) {
+		munmap(map, segment_length);
+		return total_bytes_sent;
+	    };
+	    reset_timeout(r);
+	    total_bytes_sent += w;
+	    n -= w;
+	    o += w;
+	};
+
+	(void) munmap(map, segment_length);
+	remaining_length -= segment_length;
+	o = 0;   /* set up for next pass */
+    };
+
+    /*
+     * Upon return, whether f or c->client are usable
+     * is unspecified (and therefore OS dependent).
+     *
+     * In most OSes, it should be OK to go back and use them.
+     *
+     * In the worst case, they may have to be re-created with code like:
+     *    dup(w_fd);
+     *    fclose(c->client);           -- out with the old
+     *    c->client = fdopen(w_fd);    -- in with the new
+     *
+     * This problem can be fixed in Apache-1.1 which uses it's own
+     * stdio-equivalent which will have known behavior.
+     */
+
+    return total_bytes_sent;
+}
+#endif /* ISI_MMAP */
+
 long send_fd(FILE *f, request_rec *r)
 {
     char buf[IOBUFSIZE];
     long total_bytes_sent;
     register int n,o,w;
     conn_rec *c = r->connection;
-    
+
+#ifdef ISI_MMAP
+    /*
+     * Be very conservative about invoking mmap.
+     * The file stats must be valid, we must have a regular
+     * file, and we must have ``enough'' data to send that
+     * mmapping is worthwhile.  If so, try it out.
+     * If we try it and it doesn't work, fall back
+     * on stdio if we can.
+     */
+    if (r->finfo.st_mode &&
+	    S_ISREG(r->finfo.st_mode) &&
+	    r->finfo.st_size - ftell(f) > MMAP_THRESHOLD) {
+	total_bytes_sent = send_fd_mmap(f, r);
+	if (total_bytes_sent != MMAP_AGAIN)
+	    return total_bytes_sent;
+	/* MMAP_AGAIN => fall through and do stdio anyway */
+    };
+#endif /* ISI_MMAP */
     total_bytes_sent = 0;
     while (!r->connection->aborted) {
         while ((n= fread(buf, sizeof(char), IOBUFSIZE, f)) < 1

Mime
View raw message