httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <n...@webthing.com>
Subject Re: [users@httpd] mod-cgi reads entire output into memory...
Date Mon, 16 Jul 2012 15:11:41 GMT
On Mon, 16 Jul 2012 17:07:23 +0300
Nadav Har'El <nyh@math.technion.ac.il> wrote:

> When I use it in Apache, Apache itself (NOT the CGI process!) grows
> by 512 MB (!). I was really surprised by this, because ideally Apache
> should hardly grow at all, as at most (if at all) it should be reading
> modest-sized buffers from the CGI script and writing them back to the
> socket.

Indeed, that would be normal behaviour.  I haven't encountered this problem
when generating large (albeit not quite that large) CGI output.

> I looked at the httpd code, discovered (if I understand correctly) that
> 1. As I already guessed, Apache doesn't let the CGI write directly to the
> socket, but rather asks it to write to a pipe, which Apache then reads.

Yep.  That's what CGI is all about.

> 2. When Apache reads this data from the pipe, it doesn't write it directly
> but rather just adds it to a "bucket brigade" which collects more and
> more data.

No, it doesn't collect more and more data, unless some filter needs to
buffer the entire output.  Normally it passes data down the chain.
Each filter's job is to process a chunk of data then pass it to the next.

> I confirmed that this is indeed a flow-control problem by changing the
> CGI to sleep for 1 second after outputting each 64 MB (i.e., 8 batches
> of 64 MB output); Now, the memory usage was around 64 MB, not 512 MB,
> because Apache had the time to output each batch and free its memory
> before the next batch came.

Sounds like the entire contents of the pipe got read into memory in
a single read!  Not good, but not as bad as you think.

Sleeping is a drastic workaround.  What happens if you just flush your
CGI output every 8Mb (or, preferably, in smaller chunks than that)?

> So now I guess my questions are:
> 
> 1. Has anyone ever thought of doing a "direct CGI" module, where the CGI
>    script writes directly to the socket, not to Apache's pipe, forgoing
>    any copying, buffering or filtering in Apache?
>    Does something like this already exist? Is "NPH" relevant here?

That would not be CGI.  But there are indeed several CGI modules available:
others may behave differently.

You might want to look at the mod_proxy framework as an alternative harness
to run your program.

> 2. Even if we do want Apache's output filtering capabilities, are there
>    really no flow control capabilities? Can we tell Apache not to read
>    more input (i.e., CGI's output) if the bucket brigade is larger than
>    some predefined size (e.g., 1 MB)?

A bug report might or might not attract a fix.

> 3. Some of you may think that CGI is antiquated, and I shouldn't be
>    using it - but I do have good reasons to use it ;-) But I wonder (I
>    didn't test) - is this problem specific to CGI? What happens when we
>    serve a huge disk *file*, and we can read it faster than we can send
>    it - does the bucket brigade also grow indefinitely?

Nothing to do with how fast you read it vs send it.
But you might want to trawl the dev list for when mod_substitute
was being developed, and the problem you describe motivated one
aspect of its implementation.  Of course that fix wouldn't apply
unless you had mod_substitute filtering your CGI output.


-- 
Nick Kew

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message