jmeter-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordi Carretero <jordicarret...@gmail.com>
Subject Re: Regular expression extractor for spider
Date Wed, 04 Sep 2013 16:19:56 GMT
Thanks Sebb, That vas very ilustrative for me and helped to find the
solution:

<a href="(?:http://www\.mysite\.com)*[.]*/([^"]+)

This expression to include in the regular expression extractor, extracts
the links in the pages, and can be used to populate the path field in the
recursive (for each controller) http request using a variable.

To make php links working well I had to change though Response field to
check = body (unscaped) instead of Body (do not know really why :(

Thanks again
Jordi




On Tue, Sep 3, 2013 at 8:36 PM, sebb <sebbaz@gmail.com> wrote:

> On 3 September 2013 19:08, Jordi Carretero <jordicarretero@gmail.com>
> wrote:
> > Hi
> >
> > I'm building a spider using a regular expression extractor and a
> for-each-
> > controller and works pretty well but..
> >
> > I'm using <a href="[.]*/([^"]+)" as a expression extractor , and works
> well
> > to extract links like:
> > <a href="../rel/c/items" >
> > <a href="/professions.html"
> >
> > but I can not find any expression that will work at the same time for
> > expressions found in some sites like:
> >
> > <a href="http://www.mysite.es/index.php?main_page=page&amp;id=20<
> http://www.mysite.es/index.php?main_page=page&id=20>
> > "
> >
> > that include the full domain at the beginning (and has to be removed)
> >
> > It's a matter of working with the perl expression but after some days I
> > could not manage to make it work, so any help will be appreciated
>
> If you want to ignore an optional string, use something like:
>
> (?:http://www\.mysite\.es)?
>
> The form (abc)? means abc or nothing; the (?:) form means don't save
> the contents.
>
> In your case, if you want to ignore both ".", ".." and
> "http:/www.mysite.es" you could use:
>
> (?:http://www\.mysite\.es|\.\.?)?
>
> BTW, rather than use "[.]" to escape the meta-character ".", the usual
> method is "\.".
>
> > Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@jmeter.apache.org
> For additional commands, e-mail: user-help@jmeter.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message