httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <>
Subject Re: sendfile API
Date Wed, 21 Oct 1998 19:44:49 GMT

On Tue, 20 Oct 1998, Marc Slemko wrote:

> Does anyone have any comments, positive or negative, on the following API
> for a sendfile() implementation:
> int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr *hdtr,
>         off_t *sbytes, int flags)
>    fd is the file descriptor, s is the socket descriptor, nbytes is the number
> of bytes to send (0 means send until EOF).
>    If hdtr is non-NULL, headers and/or trailers will be sent. sf_hdtr has the
> following structure:
> /*
>  * sendfile(2) header/trailer struct
>  */
> struct sf_hdtr {
>         struct iovec *headers;  /* pointer to an array of header struct iovec's */
>         int hdr_cnt;            /* number of header iovec's */
>         struct iovec *trailers; /* pointer to an array of trailer struct iovec's */
>         int trl_cnt;            /* number of trailer iovec's */
> };
>    *sbytes is an optional pointer for returning the number of bytes actually
> sent on the socket. flags is currently unused, but may be used for future
> auto-disconnect and un-bind() flags.
>    sendfile(2) returns 0 for success. It returns -1 if an error occurs, with
> errno set to the error and *sbytes set to the number of bytes that were sent
> prior to the error.
>    The only limitation this API appears to have is that nbytes is a size_t,
> which is 32bits. Thus if you want to send less than the whole file, but more
> than 4GB, you must do it in 4GB or less chunks via multiple calls. I don't
> think this will be a serious problem, especially since most usage of
> sendfile(2) will likely be with nbytes=0 (send until EOF).

Apache wouldn't use it with nbytes == 0.  Timeouts are zeroed when
progress is made, rather than an absolute number that bounds how long a
client can take... 

I personally find these "combine a zillion syscalls into one" syscalls
very distasteful.  The real reason that you want headers and trailers in
this call is because NAGLE is dumb, and because the BSD socket API is
dumb.  Your choices are:  write() causes a network packet (i.e. no nagle),
and write() causes a delay before a network packet (i.e. nagle).  Neither
are what Apache (and pretty much all other servers) want.  Apache wants: 
write() causes any number of MSS packets to be sent, the last to be held
until an explicit "flush" operation sends it (a timeout is fine too). That
way you can do a series of write()s and writev()s and sendfile()s and
whatever you want and the kernel never stupidly inserts a packet boundary,
and never stupidly delays sending a packet.  A 0-length write() or
writev() element would do well as a flush, as would a special case
ioctl().  (This is the way the linux folks want to go with this...) 

There's no way to distinguish an error on fd from an error on s in your
interface -- there's only one errno... this is a fundamental problem with
sendfile() style stuff... dunno what to do about it.


View raw message