forrest-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Stolpmann <Torsten.Stolpm...@verit.de>
Subject Fun with Regular expressions. Was: OutOfMemoryException with customized project sitemap
Date Thu, 29 Dec 2005 18:14:52 GMT

>> There are examples of regexp matchers in the core sitemap. I'm pretty 
>> poor with regular expressions, if you don't know what to put in the 
>> pattern ask here, I'm sure there will be someone who can tell you how 
>> to match
>>
>> **.html but not (**/menu-*.html or **/body-*.html or **/tabs-*.html)
>>
>> (I think they are the only ones you need to avoid).
>>
> 
> So this would be something like ^(?!tab-|menu-|body-).*.html$ and 
> ^.*/(?!tab-|menu-|body-).*.html$ respectivly.
> 
> Unfortunatly jakarta-regexp (which is used inside cocoon) doesn't seem 
> to support the negative lookahead (?!...) and gives me a 
> 'RESyntaxException: Syntax error: Missing operand to closure'.
> 
> This already been reported on the regexp mailing list (See: 
> http://permalink.gmane.org/gmane.comp.jakarta.regexp.user/168).
> 
> Too bad - jakarta-oro supports perl5 regexps.
> 
> I'll go hunting for a supported regexp and will report in later.
> 

Since I promised an update:

A working regular expression (without negative lookahead) is the following:

^(([^t^m^b].*)|((t[^a].*)|(ta[^b].*)|(tab[^\-].*))|((m[^e].*)|(me[^n].*)|(men[^u].*)|(menu[^\-].*))|((b[^o].*)|(bo[^d].*)|(bod[^y].*)|(body[^\-].*)))\.html$

But then again jakarta-regexp leaves me standing in the cold with:

java.lang.StackOverflowError
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
...
at 
org.apache.cocoon.matching.AbstractRegexpMatcher.preparedMatch(AbstractRegexpMatcher.java:86)

Again jakarta-oro matches this without problems.

*sigh*

Torsten

Mime
View raw message