apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@lyra.org>
Subject Re: cvs commit: apr-util/buckets apr_brigade.c
Date Thu, 23 May 2002 21:13:08 GMT
[ following up, per Cliff's notes in httpd's STATUS ]

On Mon, May 13, 2002 at 05:50:04PM -0700, Brian Pane wrote:
> Greg Stein wrote:
> >-1 on this commit, pending further explanation. I'm loathe to see all of
> >that extra code go in there for a dubious performance benefit. Are you
> >*truly* saving any time? I see a while() loop in there copying data. Isn't
> >it almost always faster to let the compiler produce an optimized memcpy() or
> >somesuch?
> 
> As Cliff noted, the compiler can't do an inlined memcpy() in this case.
> Without advance knowledge of the size of the region to be copied, there's
> not much that a compiler can do here.  Even if it inlined a strcpy(), it
> couldn't use multi-byte copy operations without a ton of checks for
> non-aligned addresses and short source or target strings.

Hmm. Looking closer at the loop, it does seem rather difficult to do that
well [without doing a strlen() first]

> >Brian? Let's see some details/numbers here, or else please revert.
> 
> Here are numbers from form_header_field(), the primary user of this function
> in the httpd.
> 
> The original code did a strlen in the calling function, followed by an
> apr_brigade_write().  The new code uses apr_brigade_puts().
> Before:  292 cycles per call (includes form_header_field() plus its 
> descendants)
> After:   268 cycles per call
> 
> So the change yielded an 8% reduction in the cost of writing the
> HTTP response headers.

Great. And about 0.01% of the overall response time. "Hey, I optimized this
loop by 50% !! Let's keep this code mangling."

Narrow your focus, and you can always show fantastic optimization times.

All right. I'll change my vote from a -1 to a -0, with several changes to
the function:

* stop the insanity of calling buckets "e". name the variable "bkt" or
  "last_bkt" or *something* besides "e". I'm not sure how the stupid
  convention got started, but we should endeavor to change it.

* similarly, we should be using "bb" more often for the bucket brigade
  variables.

* iterate "str" itself, and use "saved_start" for the length determination;
  this drops the final assignment inside the 'if' block


Unrelated fix:

At some point, apr_brigade_write() got broken. When the HEAP bucket at the
end of the brigade fills up, the flush function should be called. The
current code allocates *another* heap bucket. With this logic, it is
possible to grow a brigade without any periodic flushing. *very* bad.

In fact, I just realized that is the problem with some Subversion code. We
generate a response by using apr_brigade_write() into the filter stack. The
response is *UNBOUNDED*, yet apr_brigade_write() just keeps growing a really
frickin' huge brigade. When we're finally done, we flush it, and the whole
mess gets sent down the pipe.

The problem that we observed is some severe latency in the response. Where
we should have been "writing to the network as we go", it ends up that
_write() is buffering the whole mess.

[ specifically, I'm talking about the branch that says "bigger than the
  current buffer can handle..." ]

> Even more importantly, though, the optimization in apr_brigade_puts()
> helps keep the application code simple ("application code" meaning the
> httpd in this case).  I.e., it's better to do the optimization tricks
> in low-level library functions like this, so that the application developers
> don't have to do optimization tricks in their code.

Absolutely, but that is a total non-starter argument. The application
*should* have been using _puts() in the first place, keeping the application
simple. Your optimization is completely and wholly related to optimizing the
_puts() function. NOT about the application.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Mime
View raw message