Return-Path: X-Original-To: apmail-httpd-modules-dev-archive@minotaur.apache.org Delivered-To: apmail-httpd-modules-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 312269EFB for ; Mon, 14 Nov 2011 19:00:45 +0000 (UTC) Received: (qmail 79528 invoked by uid 500); 14 Nov 2011 19:00:43 -0000 Delivered-To: apmail-httpd-modules-dev-archive@httpd.apache.org Received: (qmail 79498 invoked by uid 500); 14 Nov 2011 19:00:43 -0000 Mailing-List: contact modules-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: modules-dev@httpd.apache.org Delivered-To: mailing list modules-dev@httpd.apache.org Received: (qmail 79490 invoked by uid 99); 14 Nov 2011 19:00:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 19:00:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of support@bettercgi.com designates 74.122.122.24 as permitted sender) Received: from [74.122.122.24] (HELO www.bettercgi.com) (74.122.122.24) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 19:00:35 +0000 Received: from localhost (r74-192-17-33.bcstcmta01.clsttx.tl.dh.suddenlink.net [74.192.17.33]) by www.bettercgi.com (Postfix) with ESMTPA id 6DB81218E4 for ; Mon, 14 Nov 2011 13:00:11 -0600 (CST) Date: Mon, 14 Nov 2011 13:00:07 -0600 From: Ray Morris To: modules-dev@httpd.apache.org Subject: flush or pass filter brigade to avoid memory exhaustion Message-ID: <20111114130007.5ba870de@bettercgi.com> Organization: RMEE Inc X-Mailer: Claws Mail 3.7.8 (GTK+ 2.18.9; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I would appreciate some help with splitting and passing a brigade in=20 an output filter, to avoid using memory proportional to the size of=20 the response and allow data to begin to be output prior to the=20 completion of the filter. Studying the apache.org docs, the book,=20 and other modules, I haven't been able to get this working. Trying=20 to merge the code from the docs with a sample module, the connection=20 is closed after 751,143 bytes. The apache.org docs here say this is important: from http://httpd.apache.org/docs/2.3/developer/output-filters.html The above implementation would consume memory proportional to content size.=20 If passed a FILE bucket, for example, the entire file contents would be read into memory as each apr_bucket_read call morphed a FILE bucket into a HEAP bucket. In contrast, the implementation below will consume a fixed amount of memory to filter any brigade; a temporary brigade is needed and must be allocated only once per response, see the Maintaining state section. while ((e =3D APR_BRIGADE_FIRST(bb)) !=3D APR_BRIGADE_SENTINEL(bb)) { rv =3D apr_bucket_read(e, &data, &length, APR_BLOCK_READ); if (rv) ...; /* Remove bucket e from bb. */ APR_BUCKET_REMOVE(e); /* Insert it into temporary brigade. */ APR_BRIGADE_INSERT_HEAD(tmpbb, e); /* Pass brigade downstream. */ rv =3D ap_pass_brigade(f->next, tmpbb); if (rv) ...; apr_brigade_cleanup(tmpbb); }=20 To learn about this using a simple module, I tried to patch Nick's=20 mod_txt.c to include this functionallity: typedef struct { ... apr_bucket_brigade *tmpbb; } txt_ctxt ; static int txt_filter_init(ap_filter_t* f) { txt_ctxt* ctxt =3D f->ctx =3D apr_palloc(f->r->pool, sizeof(txt_ctxt)) ; ... ctxt->tmpbb =3D apr_brigade_create(f->r->pool, f->c->bucket_alloc); return OK ; } static int txt_filter(ap_filter_t* f, apr_bucket_brigade* bb) { ... } else if ( apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ) =3D=3D APR_SUCCESS ) { /* We have a bucket full of text. Just escape it where necessary */ size_t count =3D 0 ; const char* p =3D buf ; while ( count < bytes ) { size_t sz =3D strcspn(p, "<>&\"") ; count +=3D sz ; if ( count < bytes ) { apr_bucket_split(b, sz) ; b =3D APR_BUCKET_NEXT(b) ; APR_BUCKET_INSERT_BEFORE(b, txt_esc(p[sz], f->r->connection->bucket_alloc)) ; apr_bucket_split(b, 1) ; APR_BUCKET_REMOVE(b) ; b =3D APR_BUCKET_NEXT(b) ; count +=3D 1 ; p +=3D sz + 1 ; } } APR_BUCKET_REMOVE(b); // <-- new code APR_BRIGADE_INSERT_HEAD(ctxt->tmpbb, b); // <-- new code rv =3D ap_pass_brigade(f->next, ctxt->tmpbb); // <-- new code apr_brigade_cleanup(ctxt->tmpbb); // <-- new code apr_sleep(10000); // <-- new code } testing: $ wget -v -O /dev/null https://www.bettercgi.com/tmp/words.txt --2011-11-14 12:02:23-- https://www.bettercgi.com/tmp/words.txt Resolving www.bettercgi.com... 74.122.122.24 Connecting to www.bettercgi.com|74.122.122.24|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4953699 (4.7M) [text/html] Saving to: =E2=80=9C/dev/null=E2=80=9D 15% [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> = ] 751,143 2.20M/s in 0.3s =20 2011-11-14 12:02:32 (2.20 MB/s) - Connection closed at byte 751143. Without the temporary brigade, there wwas no error message, but also no apparent improvement, as the content did not begin to download until=20 after apr_sleep(10000), which represents a long running operation. APR_BUCKET_REMOVE(b); APR_BRIGADE_INSERT_TAIL(bb, b); ap_pass_brigade(f->next, bb); b =3D APR_BRIGADE_FIRST(bb); apr_sleep(10000); Full source code for the original and the patched: http://bettercgi.com/tmp/mod_txt.c https://www.bettercgi.com/tmp/mod_patched.c --=20 Ray Morris support@bettercgi.com Strongbox - The next generation in site security: http://www.bettercgi.com/strongbox/ Throttlebox - Intelligent Bandwidth Control http://www.bettercgi.com/throttlebox/ Strongbox / Throttlebox affiliate program: http://www.bettercgi.com/affiliates/user/register.php