ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Bills <gbi...@funnelback.com>
Subject Re: containsregex and concat
Date Wed, 29 Nov 2006 05:07:48 GMT
Thanks Gilbert - except that doesn't work when tables span more than one 
line. "byline true" splits the text into multiple tokens, and the regex 
is applied independently to each token. So if the start of the 
expression (<table>) is on line one, the middle of the expression 
(<tr>blah</tr>...etc) is on line two, and the end of the expression 
(</table>) is on line three, then no one individual line / token 
matches, so nothing comes out (correct me if I'm wrong, but that's what 
seemed to happen in my testing). If byline is false, the entire text is 
one big token - so if I match the token, I get the entire token (the 
original input) back. Also, I wanted the entire table, not just the 
contents. I tried using "replace="\0"", but that just means that within 
the token I'm replacing the matching text with the matching text - not 
very useful.

What I really wanted was a way of saying "give me the matching text and 
only the matching text, not the token that matches". I sort of solved it 
by writing a regular expression to match the entirety of input: 
"(.*?)(<table[^/<]*class="summary"[^/>]>(.*?)</table>)" . With the Ant 
encoding that ends up as: 
"(.*?)(&lt;table[^&lt;/]*class=&quot;summary&quot;[^&lt;/]*&gt;(.*?)&lt;/table&gt;)(.*)".

I don't know that each table will only take one line of input (in fact, 
they won't), but since I know that there's only one table in each input 
file, I can match the entire file and use "replace="\2"" to replace the 
entire match (all input) with the second matching group (the table).

So, that works for one file. The problem I have now is getting it to 
work for multiple files - each file that I concatenate has exactly one 
summary table that I want to extract and place in a single HTML summary 
file. I tried:
(A) Concatenating (<concat>) all of the files and applying a filterchain 
- but the filterchain filters all the input once, not once per file. So 
I concatenate the files first, then apply the regex - which means I only 
get the one matching table from the entire concatenation, not one 
matching table from each file that I concatenate.
(B) Copying (<copy>) all of the files to a single file - in this case, 
the filterchain extracts the individual tables from each file - but I 
only end up with one file, because I can't make it concatenate them all 
to one destination (even with a mergemapper). "enablemultiplemappings" 
doesn't seem to help.

If there was some way of saying "for each file, apply the transform 
*before* concatenating, not after", then that would work, but as far as 
I can see, there isn't. Any ideas?

Rebhan, Gilbert wrote:
> Hi,
>
> <target name="depends">
>      <echo file="Y:/test.html">
>          <![CDATA[
>          <html>
>          <head>
>          <title>summary</title>
>          <link rel="stylesheet" href="summary.css" type="text/css">
>          </head>
>          <body>
>          <a name="overview"></a>
>          <center>
>          <table class="summary"> was wrong </table>
>          </center>
>          </html>
>          ]]>
>          </echo>
> 	</target>
>
> 	<target name="main" depends="depends">
>      
> 	<loadfile srcfile="Y:/test.html" property="summary">
>         <filterchain>
>             <containsregex
>               pattern='&lt;table[^&lt;/]*&gt;(.*?)&lt;/table&gt;'
>               replace="\1"
>               byline="true"
>               />
>             <tokenfilter>
>                 <!-- to get rid of whitespace in ${summary} -->
>                 <trim/>
>             </tokenfilter>
>         </filterchain>
>     </loadfile> 
>      
>      <echo>Summary == ${summary}</echo>
>         
> 	</target>
>
> gives only the text =
>
> depends:
> main:
>      [echo] Summary == was wrong
> BUILD SUCCESSFUL
> Total time: 407 milliseconds
>
>
> you have to use \1 and byline=true
>
> Regards, Gilbert 
>
> -----Original Message-----
> From: George Bills [mailto:gbills@funnelback.com] 
> Sent: Tuesday, November 28, 2006 6:14 AM
> To: Ant Users List
> Subject: Re: containsregex and concat
>
> Thanks: the regular expression works now, which is progress. 
> Unfortunately I'm getting all of the concatenated text, not just the 
> matching text. If I use replace:
> <filterchain>
>   <!--<tokenfilter><filetokenizer />-->
>     <containsregex flags="isg"
>       pattern="${summary.regex}"
>       replace="SUMMARYTABLE"
>       byline="false" <!-- implies filetokenizer -->
>       />
>     <!-- </tokenfilter>-->
> </filterchain>
>
> I end up getting something like:
> [concat] <html>
> [concat] <head>
> [concat] <title>summary</title>
> [concat] <link rel="stylesheet" href="summary.css" type="text/css">
> [concat] </head>
> [concat] <body>
> [concat] <a name="overview"></a>
> [concat] <center>
> [concat] SUMMARYTABLE
> [concat] </center>
> [concat] ...more HTML here...
> [concat] </html>
>
> I'm assuming it's because the file is just one big token - but if I use 
> a line tokenizer, will I be able to match regular expressions over 
> multiple lines?
>
> Thanks for the help.
>
> Rebhan, Gilbert wrote:
>   
>> Hi,
>>
>> <table[^>/]*>(.*?)</table>
>>
>> should match :
>>
>> <table class="summary">foobar</table>
>>
>> also with more than one attribute
>>
>> <table class="summary" foo="bar">foobar</table>
>>
>>
>> foobar is  /1  (group 1)
>>
>>
>> Regards, Gilbert
>>  
>>
>> -----Original Message-----
>> From: George Bills [mailto:gbills@funnelback.com] 
>> Sent: Monday, November 27, 2006 6:41 AM
>> To: Ant Users List
>> Subject: Re: containsregex and concat
>>
>> Hrm, it probably isn't since advanced regexs are still black magic to 
>> me. The "." was supposed to match any character, including a newline 
>> (with the s flag), the * to say match 0-n of them and the ? to say be 
>> lazy, match as little as possible (so that I don't pull in 
>> <table>...</table><table>...</table> in one match).
>>
>> I just tried [^<], but it doesn't seem to work - I think because of
>>     
> such
>   
>> things as "<table><tr>...</tr></table>" - the opening bracket
of <tr> 
>> conflicts. I tried [.&lt;&gt]*? to make sure that the "regex.body"
>>     
> part 
>   
>> was matching the brackets, but that didn't work either.
>>
>> Also, <table class="summary"> was wrong - <table class="summary"(.*?)>
>>     
>
>   
>> is a little better since the tables can have more than the class 
>> attribute (in fact, all of them do). But after changing that I'm 
>> matching the entire document - <html> through to </html>. That might

>> just be because I'm using filetokenizer - if I make one match within 
>> filetokenizer, do I end up getting the entire document? If so, how do
>>     
> I 
>   
>> get only the matching text?
>>
>> Regex is now: <table class="summary".*?>.*?</table>
>>
>> Thanks for the help, I appreciate it.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
>> For additional commands, e-mail: user-help@ant.apache.org
>>
>>   
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Mime
View raw message