apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Huelsmann" <ehu...@gmail.com>
Subject Re: Fix writing large chunks to consoles/pipes
Date Sat, 24 Mar 2007 16:29:47 GMT
On 3/22/07, Jonathan Gilbert <2jx64e902@sneakemail.com> wrote:
> At 10:51 AM 3/21/2007 +0100, you wrote:
> >There's a thread about fixing some issue in Subversion w.r.t. writing
> >large amounts of data to a console handle.
> >
> >This issue is an example of a broader issue: Windows doesn't support
> >arbitrary amounts of data to be written to all of its handle types.
> >Smaller writes *are* supported, but it doesn't automatically fall back
> >to smaller writes when the underlaying handle only supports small
> >writes.
>
> That is correct, as I understand it.

Hi Jonathan,

Thanks for the time you put into this. I was however not meaning to
ask for your reaction, but rather one by the committers (as we both
are not).

Since in the end they decide what goes in, I was asking what I can do
to help resolve the issue to their satisfaction.

I'm still hoping to find a committer committed to this problem enough
to help us out here - if only by providing some help on additional
requirements!



> >There were a number of patches contributed until now, with the
> >following properties (please correct me if I'm wrong or incomplete):
> >
> >1) Split all writes into 'always supported' smaller writes
>
> My original patch. Note that while it used the buffering functionality, the
> handle wasn't technically buffered in the sense of combining separate calls
> to apr_file_write.
>
> >2) Split writes for handles with specific numbers into 'always
> >supported' smaller writes
>
> Well, it wasn't done by numbers, but that was my second patch. It reworked
> how specifically console handles are opened, including some factoring of
> common std handle code that would be nice as a stand-alone patch even if
> the approach is rejected.
>
> >3) Fall back to a buffered file once the condition has been detected
> >which signals this kind of problem
>
> This was my third patch, but it isn't entirely correct to refer to the file
> as buffered for the same reason as (1). I'll explain below.
>
> >4) Fall back to a small write (once) when detecting this condition,
> >returning an incomplete write
>
> This was proposed by Joe and implemented in the patch you posted to the list.
>
> >(1) was rejected because the increased performance on 'other' handles
> >must be an OS issue not to be worked around in a portability layer
>
> Honestly, I think (1) should be applied. You wouldn't normally write code
> in such a way because you would expect it to decrease performance. My
> testing showed, however, that -- at least for file handles -- the Win32
> APIs actually perform *better* with a larger number of smaller writes. It
> is a very odd result, but it was consistent across more than 10 hours of
> continuous testing, both with preallocated files and with the file blocks
> being allocated on-the-fly. My test compared writing 55 megabyte files in
> 48KB blocks with writing the same files as a combination of each possible
> whole-megabyte write size from 1 to 10, shuffling the order of the large
> writes and randomly interleaving the two for each iteration, and at the end
> of it, every single test iteration showed the 48 KB writes to be around 3
> times faster than the large writes for the same amount of total data written.
>
> >(2) was rejected, because it doesn't solve the issue when other handle
> >numbers are of the same class
>
> There are only 3 console handles, and they are all opened through the
> methods that I modified. The only question is whether stand-alone pipes
> might behave the same way as the console std handles (which would be a fair
> indication that console std handles are in fact pipes :-). Certainly files
> do not behave that way, but, as described above and in previous messages,
> files would actually perform better with writes cut up.
>
> >(3) Wasn't rejected (yet?), but feels wrong to me: an unbuffered file
> >was created, then it should probably stay unbuffered (since starting
> >to buffer will change the behavior of the file significantly on
> >smaller writes)
>
> This is where the confusion lies. If you look closely at my third patch, it
> introduces a new BOOLEAN field in the (opaque) apr_file_t structure called
> "autoflush". This is combined with two extra lines of code in
> apr_file_write so that when this field is set, after the last chunk of the
> input buffer is sent to the apr_file_t's buffer (which may be the only
> chunk if it is a small write), apr_file_flush is called. Thus, the
> behaviour remains the same; apr_file_write doesn't return until the data
> provided has guaranteed to have been passed to the OS. The buffering code
> is used because it already divides the data up in a way that is known to
> work (though it is actually fairly inefficient, with an unnecessary
> complete copy of the data from the user-supplied buffer into the
> apr_file_t's buffer), but the file handle isn't actually made to behave
> like a buffered file.
>
> I don't understand the resistance to (1), but if (1) won't be applied, I
> think (3) is the best option.
>
> >(4) Was rejected because the same condition was detected over and over
> >again, resulting in too many failed OS calls.
> >
> >Patch #4 can be easily extended to cache the result of the fallback in
> >the file structure. Would that be enough to get this change committed
> >/ issue resolved?
>
> This is effectively what (3) does, except that it is more efficient than
> (4) (in the same way that (1) is more efficient than (3)) because it
> reduces the stack depth between the loop that is pumping the chunks of data
> and the actual call to the system API.
>
> >If not, what more would you expect?
>
> I can't think of many other possible permutations with the desired effect.
> My recommendation would be (1) in preference to (3), (3) in preference to
> (4), and (4) with caching of the result if nothing else. Something must be
> done, though, because as it stands, on Windows APR's file handling does not
> live up to its documentation, at least where console handles are concerned.
> Though it is not APR's fault, it is certainly APR's problem, because we all
> know how likely it is that Microsoft will push through an urgent fix for a
> part of Windows they've been trying to get rid of in each version of
> Windows since XP SP2.


bye,

Erik.

Mime
View raw message