httpd-test-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Bannert <aa...@clove.org>
Subject Re: Regular expressions in Flood?
Date Tue, 14 Aug 2001 14:54:43 GMT
On Mon, Aug 13, 2001 at 11:09:46PM -0700, Justin Erenkrantz wrote:
> [ Maybe some Perl hacker will read this and have thoughts... ]
> 
> Anyway, I encountered the following situation when running flood today
> and the only way I can think of resolving this is with full blown
> regular expressions.
> 
> Here's the scenario I have:
> 
> We want to extract some information from the response returned by
> the server.  So, let's say we want to get an ID back that is embedded
> in a URL.  An example:
> 
> ...blah...<A HREF="http://www.example.com/test.jsp?id=123" class="bar">Justin's
test</A>...blah
> 
> So what I currently have in CVS will work like ($ is a really bad
> delimiter, but it's what I chose and is easily changeable if I could
> come up with something better):
> 
> <A HREF="http://www.example.com/test.jsp?id=$$" class="bar">Justin's test</A>
> 
> And, $$ will now take on the value 123 by some rudimentary pattern
> matching.
> 
> However, that all gets shot to hell when faced with:
> 
> <A HREF="http://www.example.com/test.jsp?id=$$" tabindex="50" class="bar">Justin's
test</A>
> 
> Now, the tabindex value is keyed off of its position within the document
> (and we can't move the tabindex value around due to limitations in JSP
> land).  I definitely don't want to hardcode 50 in the response 
> "template" (i.e.  what flood will look for).  So, the alternative seems 
> to be bite the bullet and use regex.  So, the above example could be 
> coded in regex as:
> 
> <A HREF="http://www.example.com/test.jsp?id=([^"]*)" ([^>]*)>Justin's test</A>
> 
> Is this correct (Roy says so)?  Then, $1 (variable one in the regex) is 
> 123 in my example.  $2 is the rest of the junk I don't care much about.

The regexp looks good (you'll have to escape metacharacters like . and ? at
least, and maybe > and < also depending on the regexp lib you use).

What will $2 be for? I don't see how it can affect the next URL that
the user will hit, hence I don't know why you would want to keep it around.

> This also leads to a problem with how do I tell flood that I want to
> retrieve $1 and place it in my "state" table?  I don't know exactly how
> to do that.  I'm just thinking to hardcode $1 as what it should grab.
> Maybe I could add a responsetemplatevalue in XML which says, "Use
> this number parameter from the regex and store its value in your state 
> table."  Is there some common semantic for doing this?

Each flood <url> will have two parts, a "response template" and a "request template".
The response template semantics would work like this:

given regexp R (optionally multiline or default to singleline), apply R to
each line of the response. For each line that matches, apply V, the set
of variable substitions, which could be something like "foo=$1, bar=$2".

A request template would then be the obvious:
"http://www.example.com/blah.jsp?foo=$(foo)" (variable substitution syntax
is up to you).

> Also, does anyone know anything about the POSIX regex functions (in 
> regex.h)?  Is there a reason to use PCRE even when the POSIX regex 
> functions are available?  I've coded up a quick proof-of-concept using
> the POSIX regex functions, but I'm not sure why httpd doesn't use the
> POSIX library (unless it isn't very common).  I haven't come across a
> system that didn't have POSIX regex, but I'll bet there is one.
> However, both of the "target" platforms (Solaris and Linux) both have
> the POSIX regex libraries.  So, I'm tempted not to use PCRE unless 
> there is a good reason to.  -- justin

I'll look into the differences between PCRE and regexp.h. Many programs'
configure scripts will allow you to override the regexp.h on your system
with an explicit path to your GNU regexp lib, so there must be a reason
for that too. (speed? memory constraints? correctness? functionality?)

-aaron

Mime
View raw message