httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: zero-copy and mux
Date Sun, 20 Jun 1999 18:32:52 GMT
On Sun, 20 Jun 1999, Dean Gaudet wrote:

> and we're back to that 5.5us cost. 

I just wanted to stress that this is like an absolute worst case cost
too... and it really only shows up when you have a lot of *small*
responses in a pipelined or mux situation.  The large_write() heuristic
essentially guarantees us a write pattern like this:

    copy of headers into our buffer
    writev(fd, [buffer, first_page_of_file])
    write(fd, second_page_of_file)
    ...
    write(fd, last_full_page_of_file)
    copy of last page of file into our buffer
    write(fd, buffer)

We're probably not arguing about getting rid of the copy of header
strings into the buffer -- the simple fact that portable writev() is
limited to 16 vectors makes this point moot.  And it's only on the order
of 300 bytes typically; negligable cost.

The full pages of the file we're giving the kernel all it needs to
do zero-copy -- we've handed it a page-aligned (using mmap) chunk of
the file.  We can't help it any more really.

The last page of the file is an interesting chance for more optimization.
We don't have to copy if we know we're about to flush anyhow -- the whole
saferead/halfduplex trick.  That's another optimization that isn't too
hard to perform, and doesn't require full zero-copy semantics.

We frequently have to copy the partial last page if we're pipelining or
if we're doing mux.  But there we can still reduce the impact.  Right now
we use 4k buffer sizes, we really should have something closer to the
1460 tcp packet payload as our large_write heuristic.  A bunch of chances
for someone with a good benchmark setup to tune and tweak.

On linux, the partial last page has another possible solution -- TCP_CORK.
When you setsockopt(TCP_CORK) it tells the kernel it can send any full
payload packets it can assemble, but it has to hang onto the last
non-full payload packet until you remove the cork.  That is to say,
there's an explicit flush operation... this is way better than the
nagle/no-nagle which the standard socket api provides.  So what we can
do on linux is set the cork, and write the last partial page regardless.
Then we pull the cork later when we've figured out we're really done
with all the mux pieces.  This lets us skip the extra copying.

The cork was put there to deal with the sendfile() initial page problem.
You'll notice that most other sendfile implementations include an iovec,
intended for the headers, so that the kernel can copy the headers into the
first packet and avoid an extra packet on the net.  The linux folks were
loathe to make combination syscalls like that, they put the cork in at
my suggestion because it lets us use a write() followed by a sendfile()
without causing a short packet to go out.

Dean


Mime
View raw message