ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Bills <>
Subject Re: containsregex and concat
Date Tue, 28 Nov 2006 05:14:26 GMT
Thanks: the regular expression works now, which is progress. 
Unfortunately I'm getting all of the concatenated text, not just the 
matching text. If I use replace:
  <!--<tokenfilter><filetokenizer />-->
    <containsregex flags="isg"
      byline="false" <!-- implies filetokenizer -->
    <!-- </tokenfilter>-->

I end up getting something like:
[concat] <html>
[concat] <head>
[concat] <title>summary</title>
[concat] <link rel="stylesheet" href="summary.css" type="text/css">
[concat] </head>
[concat] <body>
[concat] <a name="overview"></a>
[concat] <center>
[concat] </center>
[concat] ...more HTML here...
[concat] </html>

I'm assuming it's because the file is just one big token - but if I use 
a line tokenizer, will I be able to match regular expressions over 
multiple lines?

Thanks for the help.

Rebhan, Gilbert wrote:
> Hi,
> <table[^>/]*>(.*?)</table>
> should match :
> <table class="summary">foobar</table>
> also with more than one attribute
> <table class="summary" foo="bar">foobar</table>
> foobar is  /1  (group 1)
> Regards, Gilbert
> -----Original Message-----
> From: George Bills [] 
> Sent: Monday, November 27, 2006 6:41 AM
> To: Ant Users List
> Subject: Re: containsregex and concat
> Hrm, it probably isn't since advanced regexs are still black magic to 
> me. The "." was supposed to match any character, including a newline 
> (with the s flag), the * to say match 0-n of them and the ? to say be 
> lazy, match as little as possible (so that I don't pull in 
> <table>...</table><table>...</table> in one match).
> I just tried [^<], but it doesn't seem to work - I think because of such
> things as "<table><tr>...</tr></table>" - the opening bracket
of <tr> 
> conflicts. I tried [.&lt;&gt]*? to make sure that the "regex.body" part 
> was matching the brackets, but that didn't work either.
> Also, <table class="summary"> was wrong - <table class="summary"(.*?)> 
> is a little better since the tables can have more than the class 
> attribute (in fact, all of them do). But after changing that I'm 
> matching the entire document - <html> through to </html>. That might 
> just be because I'm using filetokenizer - if I make one match within 
> filetokenizer, do I end up getting the entire document? If so, how do I 
> get only the matching text?
> Regex is now: <table class="summary".*?>.*?</table>
> Thanks for the help, I appreciate it.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message