httpd-test-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Prucia <j.pru...@defbank.com.pl>
Subject Re: Flood: Bug fixes and questions for developers
Date Wed, 06 Aug 2003 10:38:12 GMT

[Note to moderator: I'm not able to use my normal e-mail address (jacek.prucia@acn.waw.pl)
right now, so please pass me through just once -- Thanks :)]

On Tue, 5 Aug 2003 11:22:24 -0400 (EDT)
Norman Tuttle <ntuttle@photon.poly.edu> wrote:

> Dear fellow Flood developers:
[...]
> 2) The following questions, regarding regex and responsetemplate, are
> addressed to the Flood developers on the list, and those familiar with
> regex (PCRE) library and regular expressions:
> 
> We're trying to use the responses coming back from the websites to fill
> variables which can be used in future expressions sent back to the
> website. While I understand that regular expressions should be able to
> represent an expression to match including parts which represent a
> variable or optional piece of information, I am not so familiar with how
> Flood is using this information to feed the responsename value. For one, I
> am not sure why this information is picked up by match[1],

Most regexp packages works that way. The first item in matches table (that is: match[0]) is
the portion of the input string that was used for matching. You can assume that everytime
it is exact replica of input string. Personally I have no idea what this is good for.

> why there is an nmatch value of 10 passed to the regexec function when picking up the
> single matched value (when we are only matching for one responsename),

I have absolutelly no idea. The result is that if there are more matches, they end up in match[2],
match[3] and so on, but as you can tell from the following code -- only the first match is
used. Maybe Justin or Aaron can shed some light on this...

> why a value of 2 for the initial variable name-pattern pattern match,

it just makes sure, that regexec returns array with 2 elements:

match[0] -- entire string
match[1] -- first match

So even when your regexp matches twice or more -- regexec returns only first match. Because
of that users are required to finetune their regexp's, so that they are unique across whole
response.

> and what the other match[] members might represent.

simply, next matches. If your regexp is quite generic it might match more than one time. Look
at regexp from round-robin-dynamic.xml (XML specific encoding stripped)

/<a href="([^"]*)">/

given some HTML you can have:

match[0] = ... // whole HTML
match[1] = http://www.apache.org/
match[2] = http://cvs.apache.org/
match[2] = http://perl.apache.org/

and so on, you get the idea...

Personally I think that regexp matching code schould be rewritten at some point, so that you
can have any number of regexps against response, and every match turned into flood variable.
However, this is more work with config file than with the code. Because of XML restrictions
(attribute names must be unique) we have to change url element to something like this:

<url>
   <address>http://www.example.com/</address>
   <postprocess>
      <regexp>
         <pattern>&lt;a href=&quot;([^&quot]*);&quot;&gt;</pattern>
         <matches>
            <var>first_match</var>
            <var>second_match</var>
            <!-- any number of variables -->
         </matches>
      </regexp>
      <regexp>
         <!-- another regexp -->
      </regexp>
   </postprocess>
</url>

This is however serious change (breaks existing configs), so it schould be scheduled for major
flood rewrite (like flood 2.0 with apr-serf on board :)

regards,
Jacek Prucia


Mime
View raw message