httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nadav Har'El <...@math.technion.ac.il>
Subject [users@httpd] mod-cgi reads entire output into memory...
Date Mon, 16 Jul 2012 14:07:23 GMT
Hi, It's been 10 years since my last message to this mailing list, and
I'm happy to join it again :-)

I've encountered a surprising phenomenon with Apache's mod-cgi, which
unnecessarily slows it down for huge outputs, and as a "bonus" also
has a bug: taking up huge amounts of memory:

I have a CGI program which very quickly writes 512 MB of output.

When I use it in Apache, Apache itself (NOT the CGI process!) grows
by 512 MB (!). I was really surprised by this, because ideally Apache
should hardly grow at all, as at most (if at all) it should be reading
modest-sized buffers from the CGI script and writing them back to the
socket.

I looked at the httpd code, discovered (if I understand correctly) that
1. As I already guessed, Apache doesn't let the CGI write directly to the
socket, but rather asks it to write to a pipe, which Apache then reads.
2. When Apache reads this data from the pipe, it doesn't write it directly
but rather just adds it to a "bucket brigade" which collects more and
more data.

It appears there is no flow-control in this process: If the CGI outputs
faster than we can send to the network, the bucket brigade becomes
longer and longer, and with 512 MB of output quickly generated, up to
512 MB of buffers are allocated, and only much of it is only proccessed
and freed at the end. The peak memory usage, then, is 512 MB, and this
is also the process's memory usage when everything ends (because Apache
doesn't return this memory to the system).

I confirmed that this is indeed a flow-control problem by changing the
CGI to sleep for 1 second after outputting each 64 MB (i.e., 8 batches
of 64 MB output); Now, the memory usage was around 64 MB, not 512 MB,
because Apache had the time to output each batch and free its memory
before the next batch came.

By the way, the growth of the Apache process by 512 MB is only the start of
the problem, because not only every *process* grows by 512 MB, actually
even in the worker MPM every *thread* grows by 512 MB because apparently (?)
Apache's memory pools are separate for different threads, so the 512 MB
freed by one thread is not reused by a different threads. In my default
setup of 25 threads, all of the machine's memory and swap space was
consumed :(

So now I guess my questions are:

1. Has anyone ever thought of doing a "direct CGI" module, where the CGI
   script writes directly to the socket, not to Apache's pipe, forgoing
   any copying, buffering or filtering in Apache?
   Does something like this already exist? Is "NPH" relevant here?

2. Even if we do want Apache's output filtering capabilities, are there
   really no flow control capabilities? Can we tell Apache not to read
   more input (i.e., CGI's output) if the bucket brigade is larger than
   some predefined size (e.g., 1 MB)?

3. Some of you may think that CGI is antiquated, and I shouldn't be
   using it - but I do have good reasons to use it ;-) But I wonder (I
   didn't test) - is this problem specific to CGI? What happens when we
   serve a huge disk *file*, and we can read it faster than we can send
   it - does the bucket brigade also grow indefinitely?

Thanks,
Nadav.

-- 
Nadav Har'El                        |      Monday, Jul 16 2012, 26 Tammuz 5772
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Unix is user friendly - it's just picky
http://nadav.harel.org.il           |about its friends.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message