httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Brown <>
Subject Re: zero-copy and mux
Date Sun, 20 Jun 1999 20:59:12 GMT

[greetings, guys, just joined the list.. ]

On Sun, 20 Jun 1999, Dean Gaudet wrote:

> Maybe something to remember which might help convince folks.  With present
> 100baseT hardware, your kernel is going to make one-copy of all your data
> regardless -- because it has to assemble TCP packets to send off to the
> network card.  If you've already done one-copy just before entering the
> kernel there's a high chance that the entire 4k packet is still sitting in
> your L1 data cache when the kernel needs it.  Optimistically it'll take
> the kernel, say 200 32-bit operations to copy that 4k data into network
> packets... that's 200 cycles, or .5us on a 400Mhz processor.  Worst case
> scenario is that all the data is in the L2, and the L2 is say 10 cycles
> away.  Then your cost is 5.5us... which is above the one-copy cost you had
> to pay anyhow.

the large-ish zero copy tx case from the page cache is certainly something
to keep in mind for 2.0.  the 3com 905b, adaptec 'starfire' 9615 and
sun's hme are all pci and can all do byte-grained dma from memory into
their fifo and tack in ip checksums.  the 3com especially is affordable
and in wide spread use.

In linux the current near term plan is to use this mechanism for largeish
writes that come from the page cache (read: sendfile() and sunrpc for
kernel nfs work).  This will let us use an internal data structure
(kiobuf) to pass the references around and such.  the heuristic for 'big'
will probably be the cost of messing around with the kiobufs + the latency
incurred in having the full packet in the fifo before tx VS the cost of
building/copying a 'flat' network buffer before sending it out.  I imagine
128/256ish will be the cutoff, but I just pulled that out of thin air :)  
This stuff should be done in the next few to 6 months, I hope.

I guess all this really means is that we should have hooks for using
sendfile() whenever we're sending unmodified data from the fs.  This lets
us avoid the mmap()/munmap() gunk but also would have to be stepped around
for layers of the mux that want to modify data, etc, etc..

> OK ok, so there is gigabit ethernet and ATM hardware which can do TCP
> packet assembly.  And suppose we care about it in the apache 2.0 timeframe

don't forget hippi! :)

> support the writev we use).  I suspect that other folks doing true
> zero-copy are going to have similar restrictions -- disk -> net optimized,
> memory -> net unoptimized... and we're back to that 5.5us cost. 

*nod* don't expect linux to have user address space -> socket zero copy
any time soon.  the mm/api implecations are yucky.

> The cork was put there to deal with the sendfile() initial page problem.
> You'll notice that most other sendfile implementations include an iovec,
> intended for the headers, so that the kernel can copy the headers into
> the first packet and avoid an extra packet on the net.  The linux folks
> were loathe to make combination syscalls like that, they put the cork in
> at my suggestion because it lets us use a write() followed by a
> sendfile() without causing a short packet to go out.

* lots of nodding *

there has, however, been some noise as of late to really have some sort of
sendfile + head/tail iovecs call.  I dunno how far that will go.  The cork
thing works well; we use it in hftpd to make stupid SITE EXEC programs
spit out nice packets after we hand them the socket on stdout :)

-- zach

- - - - - -
007 373 5963

View raw message