httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sindhi Sindhi <sindhi....@gmail.com>
Subject Re: Apache Buckets and Brigade
Date Wed, 01 May 2013 16:14:48 GMT
Thanks to all for the reply.

Josh, the concern I mentioned was, we may not want mod_pagespeed to modify
the in-memory HTML content. The only change we may want to see in our HTML
will be that the old strings are replaced by the new strings after applying
our business logic which is already done by the C++ filter module I have
written. This C++ filter implements all our business logic and takes an
input buffer that is expected to be the entire HTML file content. So the
only issue is we are not sure if the Apache APR brigades contain all
contents of the HTML file. And what you understood is right, that we dont
do a static definition of a substitution, its very much dynamic that
depends on a lot of run-time logic and rules.

Nick, Thanks for the reply, could you please send some reference links
where I can look at how some of the existing HTML filters have handled this
issue? I have searched for similar issues on the internet but unfortunately
havent found any exact solution or a way to do it :(


On Wed, May 1, 2013 at 9:04 PM, Joshua Marantz <jmarantz@google.com> wrote:

> I didn't know about mod_substitute or mod_sed  :)  The
> ModPagespeedSubstitute command I proposed probably adds nothing to those.
>
> But in any case that was not sufficient for Sindhi's use-case where he
> needs to impose data-dependent business logic and not statically define a
> substitution in a conf file.
>
> -Josh
>
>
> On Wed, May 1, 2013 at 11:19 AM, Jim Jagielski <jim@jagunet.com> wrote:
>
> > How is that different from mod_substitute and/or mod_sed?
> >
> > On May 1, 2013, at 9:22 AM, Joshua Marantz <jmarantz@google.com> wrote:
> >
> > > I have a crazy idea for you.  Maybe this is overkill but this sounds
> like
> > > it'd be natural to add to mod_pagespeed <http://modpagespeed.com> as
a
> > new
> > > filter.
> > >
> > > Here's some code you might use as a template
> > >
> > >
> >
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc
> > >
> > > one thing we've thought of doing is providing a generic
> text-substitution
> > > filter that would take strings in character-blocks and do arbitrary
> > > substitutions in them, that could be specified in the .conf file:
> > >  ModPagespeedSubstitute "oldString" "newString"
> > >
> > > You are right that text-blocks in Apache output filters can be split
> > > arbitrarily across buckets, but mod_pagespeed takes care of that in an
> > > HTML-centric way, breaking up blocks on html tokens. A block of
> > free-format
> > > text would be treated as a single atomic token independent of the
> > structure
> > > of the incoming bucket brigade.
> > >
> > > Let me know if you'd like to discuss this further.
> > >
> > > -Josh
> > >
> > >
> > > On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi <sindhi.for@gmail.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> Thanks a lot for providing answers to my earlier emails with subject
> > >> "Apache C++ equivalent of javax.servlet.Filter". I really appreciate
> > your
> > >> help.
> > >>
> > >> I had another question. My requirement is something like this -
> > >>
> > >> I have a huge html file that I have copied into the Apache htdocs
> > folder.
> > >> In my C++ Apache module, I want to get this html file contents and
> > >> remove/replace some strings.
> > >>
> > >> Say I have a HTML file that has the string "oldString" appearing 3
> > times in
> > >> the file. My requirement is to replace "oldString" with the new string
> > >> "newString". I have already written a C++ function that has a
> signature
> > >> like this -
> > >>
> > >> char* processHTML(char* inHTMLString) {
> > >> //
> > >> char* newHTMLWithNewString = <code to replace all occurrences of
> > >> "oldString" with "newString">
> > >> return newHTMLWithNewString;
> > >> }
> > >>
> > >> The above function does a lot more than just string replace, it has
> lot
> > of
> > >> business logic implemented and finally returns the new HTML string.
> > >>
> > >> I want to call processHTML() inside my C++ Apache module. As I know
> > Apache
> > >> maintains an internal data structure called Buckets and Brigades which
> > >> actually contain the HTML file data. My question is, is the entire
> HTML
> > >> file content (in my case the html file is huge) residing in a single
> > >> bucket? Means, when I fetch one bucket at a time from a brigade, can I
> > be
> > >> sure that the entire HTML file data from <html> to </html>
can be
> found
> > in
> > >> a single bucket? For ex. if my html file looks like this -
> > >> <html>
> > >> ..
> > >> ..
> > >> oldString
> > >> ... oldString...........oldString..
> > >> ..
> > >> </html>
> > >>
> > >> When I iterate through all buckets of a brigade, will I find my entire
> > HTML
> > >> file content in a single bucket OR the HTML file content can be
> present
> > in
> > >> multiple buckets, say like this -
> > >>
> > >> case1:
> > >> bucket-1 contents =
> > >> "<html>
> > >> ..
> > >> ..
> > >> oldString
> > >> ... oldString...........oldString..
> > >> ..
> > >> </html>"
> > >>
> > >> case2:
> > >> bucket-1 contents =
> > >> "<html>
> > >> ..
> > >> ..
> > >> oldStr"
> > >>
> > >> bucket-2 contents =
> > >> "ing
> > >> ... oldString...........oldString..
> > >> ..
> > >> </html>"
> > >>
> > >> If its case2, then the the function processHTML() I have written will
> > not
> > >> work because it searches for the entire string "oldString" and in
> case2
> > >> "oldString" is found only partially.
> > >>
> > >> Thanks a lot.
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message