jakarta-regexp-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DECAFFMEYER MATHIEU" <MATHIEU.DECAFFMA...@fortis.lu>
Subject Error : at org.apache.regexp.RE.matchNodes(Unknown Source)
Date Thu, 25 Jan 2007 16:25:23 GMT
Hi,
my reguler expression is the following :

headlineRegex  ->  (&lt;h1&gt;)?(.*)&lt;/h1&gt;
group  ->  2

I am using the regular expression above to extract a headline (h1) from
an HTML document

    while (mHeadlineRE.match(content, offset)) {

For some Html pages this regular expression works,
but for some Html pages, it gives the following errors :

[...]
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchAt(Unknown Source)
	at org.apache.regexp.RE.match(Unknown Source)
	at org.apache.regexp.RE.match(Unknown Source)
	at
net.sf.regain.crawler.preparator.html.HtmlContentExtractor.extractHeadli
nes(HtmlContentExtractor.java:140)

on this line :
while (mHeadlineRE.match(content, offset)) {


when I use this regex : &lt;h1&gt;(.*)&lt;/h1&gt;     
group 1
I never have this error.

The problem is that I don't even know how to debug this error, that's
why I am asking for some help here.

Any help is very appreciated, thank u!


__________________________________

   Matthew




============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not
accept legal responsibility for the contents of this message. The information contained in
this e-mail is confidential and may be legally privileged. It is intended solely for the addressee.
If you are not the intended recipient, any disclosure, copying, distribution or any action
taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Nothing
in the message is capable or intended to create any legally binding obligations on either
party and it is not intended to provide legal advice.
============================================


Mime
View raw message