Return-Path: X-Original-To: apmail-httpd-modules-dev-archive@minotaur.apache.org Delivered-To: apmail-httpd-modules-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6595BF670 for ; Wed, 1 May 2013 13:57:09 +0000 (UTC) Received: (qmail 57022 invoked by uid 500); 1 May 2013 13:56:30 -0000 Delivered-To: apmail-httpd-modules-dev-archive@httpd.apache.org Received: (qmail 56592 invoked by uid 500); 1 May 2013 13:56:27 -0000 Mailing-List: contact modules-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: modules-dev@httpd.apache.org Delivered-To: mailing list modules-dev@httpd.apache.org Received: (qmail 45672 invoked by uid 99); 1 May 2013 13:49:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 13:49:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sindhi.for@gmail.com designates 209.85.223.174 as permitted sender) Received: from [209.85.223.174] (HELO mail-ie0-f174.google.com) (209.85.223.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 13:49:11 +0000 Received: by mail-ie0-f174.google.com with SMTP id 10so1898194ied.33 for ; Wed, 01 May 2013 06:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=UocviXcPbl2kwZ16YYrCx+/vI7tgr++PFXRj53mRL8s=; b=aAUwtTbR/q4Un3Wjx/Alwn7LdDCCtUI4m+09FPLHjcRPJ5lvgqeKVyQt5POsE2Icu4 3CS74QN19t+tfU0ih5NnQaavX8uXKlQr60/phzglsPLYl5MDBWp4VJT1OJJQKsmzsVa7 zL6aIEh7KZ0Om7Lwa/lS4wOWdZWd9zASFcZd5K1/eK3QwGpd5zJQOdW++yYktPwDWUYD FDumNxcf/4AQf7DHWqCnGwo+TQXuV03+Nmm4rDGE0UuW/hD+qb7lVhyIb83JLaq2wjLi aWa84ila91+fMmDIxnz6lNrA2gan8kb6HvKO4ylu8NntSW5TAKiEwBExoWepw76OkleN aFOg== MIME-Version: 1.0 X-Received: by 10.50.49.7 with SMTP id q7mr1748694ign.6.1367416130694; Wed, 01 May 2013 06:48:50 -0700 (PDT) Received: by 10.64.93.100 with HTTP; Wed, 1 May 2013 06:48:50 -0700 (PDT) In-Reply-To: References: Date: Wed, 1 May 2013 19:18:50 +0530 Message-ID: Subject: Re: Apache Buckets and Brigade From: Sindhi Sindhi To: modules-dev@httpd.apache.org Content-Type: multipart/alternative; boundary=e89a8f642e40bb1cf104dba860d4 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f642e40bb1cf104dba860d4 Content-Type: text/plain; charset=ISO-8859-1 Thanks. I'd definitely be interested in discussing further. Theres one more thing, I doubt if I can use ModPagespeedSubstitute, because our string replacement actually uses some business logic. For ex. in "oldString", if i find a "old" string at offset 0 i'll replace it with "new" otherwise I'll replace it with "temp". The one I mentioned in my previous email was just a very simple and straight forward example. When our business logic runs over the huge html file we have it executes a lot more rules to find out if it should replace "oldString" with "newString" or with "tempString" or with some other string. So for me its very critical that the HTML tags are read in complete and not partially when the string replacement function is called. The HTML-centric fetch of data as you mentioned suits the best for me. But I dont want mod_pagespeed to actually modify anything in my HTML file, if it can give me either the entire HTML file OR HTML-centric fetch of data that will solve my problem :) On Wed, May 1, 2013 at 6:52 PM, Joshua Marantz wrote: > I have a crazy idea for you. Maybe this is overkill but this sounds like > it'd be natural to add to mod_pagespeed as a new > filter. > > Here's some code you might use as a template > > > https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc > > one thing we've thought of doing is providing a generic text-substitution > filter that would take strings in character-blocks and do arbitrary > substitutions in them, that could be specified in the .conf file: > ModPagespeedSubstitute "oldString" "newString" > > You are right that text-blocks in Apache output filters can be split > arbitrarily across buckets, but mod_pagespeed takes care of that in an > HTML-centric way, breaking up blocks on html tokens. A block of free-format > text would be treated as a single atomic token independent of the structure > of the incoming bucket brigade. > > Let me know if you'd like to discuss this further. > > -Josh > > > On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi > wrote: > > > Hello, > > > > Thanks a lot for providing answers to my earlier emails with subject > > "Apache C++ equivalent of javax.servlet.Filter". I really appreciate your > > help. > > > > I had another question. My requirement is something like this - > > > > I have a huge html file that I have copied into the Apache htdocs folder. > > In my C++ Apache module, I want to get this html file contents and > > remove/replace some strings. > > > > Say I have a HTML file that has the string "oldString" appearing 3 times > in > > the file. My requirement is to replace "oldString" with the new string > > "newString". I have already written a C++ function that has a signature > > like this - > > > > char* processHTML(char* inHTMLString) { > > // > > char* newHTMLWithNewString = > "oldString" with "newString"> > > return newHTMLWithNewString; > > } > > > > The above function does a lot more than just string replace, it has lot > of > > business logic implemented and finally returns the new HTML string. > > > > I want to call processHTML() inside my C++ Apache module. As I know > Apache > > maintains an internal data structure called Buckets and Brigades which > > actually contain the HTML file data. My question is, is the entire HTML > > file content (in my case the html file is huge) residing in a single > > bucket? Means, when I fetch one bucket at a time from a brigade, can I be > > sure that the entire HTML file data from to can be found > in > > a single bucket? For ex. if my html file looks like this - > > > > .. > > .. > > oldString > > ... oldString...........oldString.. > > .. > > > > > > When I iterate through all buckets of a brigade, will I find my entire > HTML > > file content in a single bucket OR the HTML file content can be present > in > > multiple buckets, say like this - > > > > case1: > > bucket-1 contents = > > " > > .. > > .. > > oldString > > ... oldString...........oldString.. > > .. > > " > > > > case2: > > bucket-1 contents = > > " > > .. > > .. > > oldStr" > > > > bucket-2 contents = > > "ing > > ... oldString...........oldString.. > > .. > > " > > > > If its case2, then the the function processHTML() I have written will not > > work because it searches for the entire string "oldString" and in case2 > > "oldString" is found only partially. > > > > Thanks a lot. > > > --e89a8f642e40bb1cf104dba860d4--