httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Jagielski <...@jaguNET.com>
Subject Re: Apache Buckets and Brigade
Date Wed, 01 May 2013 15:19:52 GMT
How is that different from mod_substitute and/or mod_sed?

On May 1, 2013, at 9:22 AM, Joshua Marantz <jmarantz@google.com> wrote:

> I have a crazy idea for you.  Maybe this is overkill but this sounds like
> it'd be natural to add to mod_pagespeed <http://modpagespeed.com> as a new
> filter.
> 
> Here's some code you might use as a template
> 
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc
> 
> one thing we've thought of doing is providing a generic text-substitution
> filter that would take strings in character-blocks and do arbitrary
> substitutions in them, that could be specified in the .conf file:
>  ModPagespeedSubstitute "oldString" "newString"
> 
> You are right that text-blocks in Apache output filters can be split
> arbitrarily across buckets, but mod_pagespeed takes care of that in an
> HTML-centric way, breaking up blocks on html tokens. A block of free-format
> text would be treated as a single atomic token independent of the structure
> of the incoming bucket brigade.
> 
> Let me know if you'd like to discuss this further.
> 
> -Josh
> 
> 
> On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi <sindhi.for@gmail.com> wrote:
> 
>> Hello,
>> 
>> Thanks a lot for providing answers to my earlier emails with subject
>> "Apache C++ equivalent of javax.servlet.Filter". I really appreciate your
>> help.
>> 
>> I had another question. My requirement is something like this -
>> 
>> I have a huge html file that I have copied into the Apache htdocs folder.
>> In my C++ Apache module, I want to get this html file contents and
>> remove/replace some strings.
>> 
>> Say I have a HTML file that has the string "oldString" appearing 3 times in
>> the file. My requirement is to replace "oldString" with the new string
>> "newString". I have already written a C++ function that has a signature
>> like this -
>> 
>> char* processHTML(char* inHTMLString) {
>> //
>> char* newHTMLWithNewString = <code to replace all occurrences of
>> "oldString" with "newString">
>> return newHTMLWithNewString;
>> }
>> 
>> The above function does a lot more than just string replace, it has lot of
>> business logic implemented and finally returns the new HTML string.
>> 
>> I want to call processHTML() inside my C++ Apache module. As I know Apache
>> maintains an internal data structure called Buckets and Brigades which
>> actually contain the HTML file data. My question is, is the entire HTML
>> file content (in my case the html file is huge) residing in a single
>> bucket? Means, when I fetch one bucket at a time from a brigade, can I be
>> sure that the entire HTML file data from <html> to </html> can be found
in
>> a single bucket? For ex. if my html file looks like this -
>> <html>
>> ..
>> ..
>> oldString
>> ... oldString...........oldString..
>> ..
>> </html>
>> 
>> When I iterate through all buckets of a brigade, will I find my entire HTML
>> file content in a single bucket OR the HTML file content can be present in
>> multiple buckets, say like this -
>> 
>> case1:
>> bucket-1 contents =
>> "<html>
>> ..
>> ..
>> oldString
>> ... oldString...........oldString..
>> ..
>> </html>"
>> 
>> case2:
>> bucket-1 contents =
>> "<html>
>> ..
>> ..
>> oldStr"
>> 
>> bucket-2 contents =
>> "ing
>> ... oldString...........oldString..
>> ..
>> </html>"
>> 
>> If its case2, then the the function processHTML() I have written will not
>> work because it searches for the entire string "oldString" and in case2
>> "oldString" is found only partially.
>> 
>> Thanks a lot.
>> 


Mime
View raw message